(awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python -u main.py --epochs 500 --data data/March-2018_16mb --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1500 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 20 --emsize=400 --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight [QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=800, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=1200) ) )] Args: Namespace(alpha=2, batch_size=20, beta=1, bptt=70, clip=0.25, cuda=True, data='data/March-2018_16mb', dropout=0.4, dropoute=0.1, dropouth=0.2, dropouti=0.4, emsize=400, epochs=500, log_interval=200, lr=30, model='QRNN', nhid=1500, nlayers=4, nonmono=5, save='models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt', seed=4002, tied=True, wdecay=1.2e-06, wdrop=0.1) Model total parameters: 83198208 | epoch 1 | 200/ 1568 batches | lr 30.00 | ms/batch 290.15 | loss 9.53 | ppl 13791.22 | epoch 1 | 400/ 1568 batches | lr 30.00 | ms/batch 286.14 | loss 8.10 | ppl 3309.71 | epoch 1 | 600/ 1568 batches | lr 30.00 | ms/batch 280.38 | loss 7.90 | ppl 2705.63 | epoch 1 | 800/ 1568 batches | lr 30.00 | ms/batch 287.09 | loss 7.79 | ppl 2426.50 | epoch 1 | 1000/ 1568 batches | lr 30.00 | ms/batch 291.92 | loss 7.56 | ppl 1916.16 | epoch 1 | 1200/ 1568 batches | lr 30.00 | ms/batch 284.58 | loss 7.40 | ppl 1632.08 | epoch 1 | 1400/ 1568 batches | lr 30.00 | ms/batch 290.08 | loss 7.38 | ppl 1608.96 | epoch 1 | 1600/ 1568 batches | lr 30.00 | ms/batch 285.62 | loss 7.26 | ppl 1419.82 ----------------------------------------------------------------------------------------- | end of epoch 1 | time: 493.51s | valid loss 7.04 | valid ppl 1136.90 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 2 | 200/ 1568 batches | lr 30.00 | ms/batch 289.09 | loss 7.22 | ppl 1360.57 | epoch 2 | 400/ 1568 batches | lr 30.00 | ms/batch 287.82 | loss 7.18 | ppl 1307.11 | epoch 2 | 600/ 1568 batches | lr 30.00 | ms/batch 285.94 | loss 7.14 | ppl 1259.85 | epoch 2 | 800/ 1568 batches | lr 30.00 | ms/batch 285.20 | loss 7.13 | ppl 1245.13 | epoch 2 | 1000/ 1568 batches | lr 30.00 | ms/batch 290.31 | loss 7.07 | ppl 1171.78 | epoch 2 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.34 | loss 7.03 | ppl 1128.08 | epoch 2 | 1400/ 1568 batches | lr 30.00 | ms/batch 292.24 | loss 6.97 | ppl 1067.81 | epoch 2 | 1600/ 1568 batches | lr 30.00 | ms/batch 288.31 | loss 6.87 | ppl 961.72 ----------------------------------------------------------------------------------------- | end of epoch 2 | time: 494.80s | valid loss 6.87 | valid ppl 964.90 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 3 | 200/ 1568 batches | lr 30.00 | ms/batch 290.72 | loss 6.88 | ppl 967.84 | epoch 3 | 400/ 1568 batches | lr 30.00 | ms/batch 284.22 | loss 6.77 | ppl 870.32 | epoch 3 | 600/ 1568 batches | lr 30.00 | ms/batch 287.82 | loss 6.80 | ppl 895.95 | epoch 3 | 800/ 1568 batches | lr 30.00 | ms/batch 285.21 | loss 6.79 | ppl 885.60 | epoch 3 | 1000/ 1568 batches | lr 30.00 | ms/batch 285.60 | loss 6.76 | ppl 863.93 | epoch 3 | 1200/ 1568 batches | lr 30.00 | ms/batch 291.01 | loss 6.72 | ppl 829.09 | epoch 3 | 1400/ 1568 batches | lr 30.00 | ms/batch 290.11 | loss 6.68 | ppl 797.95 | epoch 3 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.19 | loss 6.66 | ppl 780.91 ----------------------------------------------------------------------------------------- | end of epoch 3 | time: 491.99s | valid loss 6.43 | valid ppl 623.10 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 4 | 200/ 1568 batches | lr 30.00 | ms/batch 283.51 | loss 6.62 | ppl 753.55 | epoch 4 | 400/ 1568 batches | lr 30.00 | ms/batch 287.15 | loss 6.55 | ppl 701.06 | epoch 4 | 600/ 1568 batches | lr 30.00 | ms/batch 289.56 | loss 6.59 | ppl 730.02 | epoch 4 | 800/ 1568 batches | lr 30.00 | ms/batch 285.18 | loss 6.55 | ppl 701.16 | epoch 4 | 1000/ 1568 batches | lr 30.00 | ms/batch 292.66 | loss 6.56 | ppl 707.80 | epoch 4 | 1200/ 1568 batches | lr 30.00 | ms/batch 288.02 | loss 6.52 | ppl 676.35 | epoch 4 | 1400/ 1568 batches | lr 30.00 | ms/batch 283.98 | loss 6.51 | ppl 672.14 | epoch 4 | 1600/ 1568 batches | lr 30.00 | ms/batch 290.28 | loss 6.45 | ppl 632.88 ----------------------------------------------------------------------------------------- | end of epoch 4 | time: 491.90s | valid loss 6.30 | valid ppl 546.87 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 5 | 200/ 1568 batches | lr 30.00 | ms/batch 289.18 | loss 6.46 | ppl 639.59 | epoch 5 | 400/ 1568 batches | lr 30.00 | ms/batch 288.67 | loss 6.42 | ppl 612.29 | epoch 5 | 600/ 1568 batches | lr 30.00 | ms/batch 284.84 | loss 6.42 | ppl 614.78 | epoch 5 | 800/ 1568 batches | lr 30.00 | ms/batch 285.50 | loss 6.41 | ppl 609.25 | epoch 5 | 1000/ 1568 batches | lr 30.00 | ms/batch 286.15 | loss 6.43 | ppl 621.17 | epoch 5 | 1200/ 1568 batches | lr 30.00 | ms/batch 280.54 | loss 6.37 | ppl 582.55 | epoch 5 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.93 | loss 6.38 | ppl 592.80 | epoch 5 | 1600/ 1568 batches | lr 30.00 | ms/batch 286.59 | loss 6.34 | ppl 565.66 ----------------------------------------------------------------------------------------- | end of epoch 5 | time: 491.09s | valid loss 6.60 | valid ppl 735.60 ----------------------------------------------------------------------------------------- | epoch 6 | 200/ 1568 batches | lr 30.00 | ms/batch 289.15 | loss 6.35 | ppl 570.23 | epoch 6 | 400/ 1568 batches | lr 30.00 | ms/batch 282.99 | loss 6.30 | ppl 542.95 | epoch 6 | 600/ 1568 batches | lr 30.00 | ms/batch 291.00 | loss 6.31 | ppl 549.98 | epoch 6 | 800/ 1568 batches | lr 30.00 | ms/batch 290.01 | loss 6.30 | ppl 545.94 | epoch 6 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.64 | loss 6.30 | ppl 544.77 | epoch 6 | 1200/ 1568 batches | lr 30.00 | ms/batch 288.92 | loss 6.28 | ppl 534.71 | epoch 6 | 1400/ 1568 batches | lr 30.00 | ms/batch 287.04 | loss 6.27 | ppl 527.80 | epoch 6 | 1600/ 1568 batches | lr 30.00 | ms/batch 287.86 | loss 6.22 | ppl 503.78 ----------------------------------------------------------------------------------------- | end of epoch 6 | time: 492.53s | valid loss 6.06 | valid ppl 428.68 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 7 | 200/ 1568 batches | lr 30.00 | ms/batch 292.16 | loss 6.25 | ppl 519.83 | epoch 7 | 400/ 1568 batches | lr 30.00 | ms/batch 289.68 | loss 6.23 | ppl 509.07 | epoch 7 | 600/ 1568 batches | lr 30.00 | ms/batch 292.98 | loss 6.21 | ppl 497.66 | epoch 7 | 800/ 1568 batches | lr 30.00 | ms/batch 288.05 | loss 6.21 | ppl 496.82 | epoch 7 | 1000/ 1568 batches | lr 30.00 | ms/batch 292.02 | loss 6.22 | ppl 500.77 | epoch 7 | 1200/ 1568 batches | lr 30.00 | ms/batch 292.72 | loss 6.18 | ppl 485.23 | epoch 7 | 1400/ 1568 batches | lr 30.00 | ms/batch 282.70 | loss 6.15 | ppl 469.94 | epoch 7 | 1600/ 1568 batches | lr 30.00 | ms/batch 290.10 | loss 6.10 | ppl 447.85 ----------------------------------------------------------------------------------------- | end of epoch 7 | time: 492.48s | valid loss 6.01 | valid ppl 406.51 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 8 | 200/ 1568 batches | lr 30.00 | ms/batch 285.87 | loss 6.16 | ppl 472.78 | epoch 8 | 400/ 1568 batches | lr 30.00 | ms/batch 284.91 | loss 6.11 | ppl 448.31 | epoch 8 | 600/ 1568 batches | lr 30.00 | ms/batch 285.99 | loss 6.15 | ppl 469.73 | epoch 8 | 800/ 1568 batches | lr 30.00 | ms/batch 286.85 | loss 6.12 | ppl 452.68 | epoch 8 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.62 | loss 6.14 | ppl 465.18 | epoch 8 | 1200/ 1568 batches | lr 30.00 | ms/batch 291.38 | loss 6.12 | ppl 454.89 | epoch 8 | 1400/ 1568 batches | lr 30.00 | ms/batch 285.35 | loss 6.11 | ppl 452.31 | epoch 8 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.48 | loss 6.04 | ppl 421.69 ----------------------------------------------------------------------------------------- | end of epoch 8 | time: 492.49s | valid loss 5.96 | valid ppl 389.44 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 9 | 200/ 1568 batches | lr 30.00 | ms/batch 287.83 | loss 6.09 | ppl 440.34 | epoch 9 | 400/ 1568 batches | lr 30.00 | ms/batch 285.70 | loss 6.05 | ppl 423.00 | epoch 9 | 600/ 1568 batches | lr 30.00 | ms/batch 286.71 | loss 6.09 | ppl 441.39 | epoch 9 | 800/ 1568 batches | lr 30.00 | ms/batch 287.49 | loss 6.04 | ppl 418.01 | epoch 9 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.99 | loss 6.09 | ppl 440.87 | epoch 9 | 1200/ 1568 batches | lr 30.00 | ms/batch 285.81 | loss 6.02 | ppl 413.52 | epoch 9 | 1400/ 1568 batches | lr 30.00 | ms/batch 290.15 | loss 6.02 | ppl 410.58 | epoch 9 | 1600/ 1568 batches | lr 30.00 | ms/batch 285.42 | loss 5.97 | ppl 392.46 ----------------------------------------------------------------------------------------- | end of epoch 9 | time: 491.99s | valid loss 6.03 | valid ppl 414.30 ----------------------------------------------------------------------------------------- | epoch 10 | 200/ 1568 batches | lr 30.00 | ms/batch 292.74 | loss 6.02 | ppl 409.63 | epoch 10 | 400/ 1568 batches | lr 30.00 | ms/batch 280.46 | loss 5.99 | ppl 397.66 | epoch 10 | 600/ 1568 batches | lr 30.00 | ms/batch 286.01 | loss 5.98 | ppl 395.27 | epoch 10 | 800/ 1568 batches | lr 30.00 | ms/batch 286.75 | loss 5.98 | ppl 393.50 | epoch 10 | 1000/ 1568 batches | lr 30.00 | ms/batch 284.63 | loss 6.01 | ppl 406.39 | epoch 10 | 1200/ 1568 batches | lr 30.00 | ms/batch 289.28 | loss 5.97 | ppl 392.62 | epoch 10 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.02 | loss 5.97 | ppl 392.29 | epoch 10 | 1600/ 1568 batches | lr 30.00 | ms/batch 288.24 | loss 5.93 | ppl 375.28 ----------------------------------------------------------------------------------------- | end of epoch 10 | time: 492.13s | valid loss 5.81 | valid ppl 333.82 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 11 | 200/ 1568 batches | lr 30.00 | ms/batch 291.00 | loss 5.96 | ppl 389.01 | epoch 11 | 400/ 1568 batches | lr 30.00 | ms/batch 288.55 | loss 5.94 | ppl 379.83 | epoch 11 | 600/ 1568 batches | lr 30.00 | ms/batch 288.83 | loss 5.94 | ppl 378.84 | epoch 11 | 800/ 1568 batches | lr 30.00 | ms/batch 290.10 | loss 5.94 | ppl 379.41 | epoch 11 | 1000/ 1568 batches | lr 30.00 | ms/batch 290.16 | loss 5.97 | ppl 390.28 | epoch 11 | 1200/ 1568 batches | lr 30.00 | ms/batch 289.77 | loss 5.91 | ppl 368.66 | epoch 11 | 1400/ 1568 batches | lr 30.00 | ms/batch 286.85 | loss 5.91 | ppl 368.95 | epoch 11 | 1600/ 1568 batches | lr 30.00 | ms/batch 284.44 | loss 5.85 | ppl 348.30 ----------------------------------------------------------------------------------------- | end of epoch 11 | time: 492.62s | valid loss 5.91 | valid ppl 369.69 ----------------------------------------------------------------------------------------- | epoch 12 | 200/ 1568 batches | lr 30.00 | ms/batch 290.26 | loss 5.90 | ppl 366.08 | epoch 12 | 400/ 1568 batches | lr 30.00 | ms/batch 284.69 | loss 5.88 | ppl 358.59 | epoch 12 | 600/ 1568 batches | lr 30.00 | ms/batch 286.23 | loss 5.88 | ppl 356.38 | epoch 12 | 800/ 1568 batches | lr 30.00 | ms/batch 288.29 | loss 5.87 | ppl 353.76 | epoch 12 | 1000/ 1568 batches | lr 30.00 | ms/batch 286.58 | loss 5.90 | ppl 363.63 | epoch 12 | 1200/ 1568 batches | lr 30.00 | ms/batch 292.59 | loss 5.86 | ppl 351.45 | epoch 12 | 1400/ 1568 batches | lr 30.00 | ms/batch 291.95 | loss 5.86 | ppl 352.40 | epoch 12 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.40 | loss 5.81 | ppl 333.18 ----------------------------------------------------------------------------------------- | end of epoch 12 | time: 492.44s | valid loss 5.79 | valid ppl 326.58 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 13 | 200/ 1568 batches | lr 30.00 | ms/batch 292.70 | loss 5.87 | ppl 353.01 | epoch 13 | 400/ 1568 batches | lr 30.00 | ms/batch 285.49 | loss 5.84 | ppl 342.58 | epoch 13 | 600/ 1568 batches | lr 30.00 | ms/batch 286.24 | loss 5.82 | ppl 337.76 | epoch 13 | 800/ 1568 batches | lr 30.00 | ms/batch 288.30 | loss 5.85 | ppl 345.54 | epoch 13 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.75 | loss 5.86 | ppl 350.31 | epoch 13 | 1200/ 1568 batches | lr 30.00 | ms/batch 285.16 | loss 5.82 | ppl 337.16 | epoch 13 | 1400/ 1568 batches | lr 30.00 | ms/batch 293.21 | loss 5.82 | ppl 337.78 | epoch 13 | 1600/ 1568 batches | lr 30.00 | ms/batch 288.06 | loss 5.76 | ppl 316.98 ----------------------------------------------------------------------------------------- | end of epoch 13 | time: 492.20s | valid loss 5.87 | valid ppl 355.47 ----------------------------------------------------------------------------------------- | epoch 14 | 200/ 1568 batches | lr 30.00 | ms/batch 288.65 | loss 5.82 | ppl 335.61 | epoch 14 | 400/ 1568 batches | lr 30.00 | ms/batch 285.17 | loss 5.78 | ppl 322.37 | epoch 14 | 600/ 1568 batches | lr 30.00 | ms/batch 284.07 | loss 5.77 | ppl 320.65 | epoch 14 | 800/ 1568 batches | lr 30.00 | ms/batch 287.54 | loss 5.78 | ppl 323.35 | epoch 14 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.03 | loss 5.81 | ppl 335.18 | epoch 14 | 1200/ 1568 batches | lr 30.00 | ms/batch 290.21 | loss 5.76 | ppl 318.59 | epoch 14 | 1400/ 1568 batches | lr 30.00 | ms/batch 287.36 | loss 5.75 | ppl 315.35 | epoch 14 | 1600/ 1568 batches | lr 30.00 | ms/batch 291.66 | loss 5.73 | ppl 307.44 ----------------------------------------------------------------------------------------- | end of epoch 14 | time: 492.80s | valid loss 5.63 | valid ppl 277.95 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 15 | 200/ 1568 batches | lr 30.00 | ms/batch 288.43 | loss 5.77 | ppl 319.81 | epoch 15 | 400/ 1568 batches | lr 30.00 | ms/batch 287.07 | loss 5.76 | ppl 317.85 | epoch 15 | 600/ 1568 batches | lr 30.00 | ms/batch 285.03 | loss 5.73 | ppl 309.25 | epoch 15 | 800/ 1568 batches | lr 30.00 | ms/batch 287.90 | loss 5.74 | ppl 310.46 | epoch 15 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.21 | loss 5.77 | ppl 319.09 | epoch 15 | 1200/ 1568 batches | lr 30.00 | ms/batch 286.73 | loss 5.73 | ppl 308.47 | epoch 15 | 1400/ 1568 batches | lr 30.00 | ms/batch 292.96 | loss 5.73 | ppl 308.59 | epoch 15 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.44 | loss 5.67 | ppl 289.59 ----------------------------------------------------------------------------------------- | end of epoch 15 | time: 492.13s | valid loss 5.62 | valid ppl 275.50 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 16 | 200/ 1568 batches | lr 30.00 | ms/batch 289.60 | loss 5.72 | ppl 304.96 | epoch 16 | 400/ 1568 batches | lr 30.00 | ms/batch 290.57 | loss 5.72 | ppl 304.73 | epoch 16 | 600/ 1568 batches | lr 30.00 | ms/batch 288.21 | loss 5.69 | ppl 296.58 | epoch 16 | 800/ 1568 batches | lr 30.00 | ms/batch 292.55 | loss 5.71 | ppl 300.84 | epoch 16 | 1000/ 1568 batches | lr 30.00 | ms/batch 289.76 | loss 5.72 | ppl 305.39 | epoch 16 | 1200/ 1568 batches | lr 30.00 | ms/batch 287.39 | loss 5.66 | ppl 287.19 | epoch 16 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.49 | loss 5.68 | ppl 292.34 | epoch 16 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.11 | loss 5.63 | ppl 280.01 ----------------------------------------------------------------------------------------- | end of epoch 16 | time: 492.24s | valid loss 5.62 | valid ppl 276.52 ----------------------------------------------------------------------------------------- | epoch 17 | 200/ 1568 batches | lr 30.00 | ms/batch 290.88 | loss 5.69 | ppl 294.85 | epoch 17 | 400/ 1568 batches | lr 30.00 | ms/batch 286.61 | loss 5.68 | ppl 291.92 | epoch 17 | 600/ 1568 batches | lr 30.00 | ms/batch 290.36 | loss 5.66 | ppl 286.14 | epoch 17 | 800/ 1568 batches | lr 30.00 | ms/batch 292.67 | loss 5.66 | ppl 288.02 | epoch 17 | 1000/ 1568 batches | lr 30.00 | ms/batch 291.82 | loss 5.70 | ppl 300.19 | epoch 17 | 1200/ 1568 batches | lr 30.00 | ms/batch 285.16 | loss 5.64 | ppl 280.20 | epoch 17 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.34 | loss 5.63 | ppl 279.34 | epoch 17 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.93 | loss 5.58 | ppl 266.18 ----------------------------------------------------------------------------------------- | end of epoch 17 | time: 492.15s | valid loss 5.56 | valid ppl 258.71 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 18 | 200/ 1568 batches | lr 30.00 | ms/batch 290.24 | loss 5.66 | ppl 287.98 | epoch 18 | 400/ 1568 batches | lr 30.00 | ms/batch 289.68 | loss 5.65 | ppl 285.65 | epoch 18 | 600/ 1568 batches | lr 30.00 | ms/batch 290.44 | loss 5.63 | ppl 279.82 | epoch 18 | 800/ 1568 batches | lr 30.00 | ms/batch 290.35 | loss 5.63 | ppl 279.03 | epoch 18 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.68 | loss 5.65 | ppl 284.84 | epoch 18 | 1200/ 1568 batches | lr 30.00 | ms/batch 288.42 | loss 5.61 | ppl 273.50 | epoch 18 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.33 | loss 5.60 | ppl 271.27 | epoch 18 | 1600/ 1568 batches | lr 30.00 | ms/batch 290.22 | loss 5.56 | ppl 260.20 ----------------------------------------------------------------------------------------- | end of epoch 18 | time: 492.72s | valid loss 5.50 | valid ppl 245.84 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 19 | 200/ 1568 batches | lr 30.00 | ms/batch 289.38 | loss 5.63 | ppl 277.39 | epoch 19 | 400/ 1568 batches | lr 30.00 | ms/batch 291.01 | loss 5.62 | ppl 275.17 | epoch 19 | 600/ 1568 batches | lr 30.00 | ms/batch 287.21 | loss 5.59 | ppl 267.97 | epoch 19 | 800/ 1568 batches | lr 30.00 | ms/batch 281.49 | loss 5.57 | ppl 262.43 | epoch 19 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.55 | loss 5.63 | ppl 278.90 | epoch 19 | 1200/ 1568 batches | lr 30.00 | ms/batch 284.02 | loss 5.57 | ppl 262.80 | epoch 19 | 1400/ 1568 batches | lr 30.00 | ms/batch 284.14 | loss 5.56 | ppl 261.09 | epoch 19 | 1600/ 1568 batches | lr 30.00 | ms/batch 286.54 | loss 5.54 | ppl 255.03 ----------------------------------------------------------------------------------------- | end of epoch 19 | time: 492.81s | valid loss 5.48 | valid ppl 239.88 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 20 | 200/ 1568 batches | lr 30.00 | ms/batch 290.01 | loss 5.58 | ppl 263.83 | epoch 20 | 400/ 1568 batches | lr 30.00 | ms/batch 290.28 | loss 5.58 | ppl 266.33 | epoch 20 | 600/ 1568 batches | lr 30.00 | ms/batch 290.70 | loss 5.56 | ppl 258.89 | epoch 20 | 800/ 1568 batches | lr 30.00 | ms/batch 289.50 | loss 5.56 | ppl 259.72 | epoch 20 | 1000/ 1568 batches | lr 30.00 | ms/batch 290.99 | loss 5.60 | ppl 271.35 | epoch 20 | 1200/ 1568 batches | lr 30.00 | ms/batch 293.33 | loss 5.56 | ppl 259.45 | epoch 20 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.23 | loss 5.54 | ppl 253.71 | epoch 20 | 1600/ 1568 batches | lr 30.00 | ms/batch 290.34 | loss 5.48 | ppl 239.35 ----------------------------------------------------------------------------------------- | end of epoch 20 | time: 493.34s | valid loss 5.45 | valid ppl 233.70 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 21 | 200/ 1568 batches | lr 30.00 | ms/batch 290.52 | loss 5.55 | ppl 256.36 | epoch 21 | 400/ 1568 batches | lr 30.00 | ms/batch 287.38 | loss 5.55 | ppl 257.77 | epoch 21 | 600/ 1568 batches | lr 30.00 | ms/batch 283.61 | loss 5.52 | ppl 249.86 | epoch 21 | 800/ 1568 batches | lr 30.00 | ms/batch 286.15 | loss 5.53 | ppl 251.60 | epoch 21 | 1000/ 1568 batches | lr 30.00 | ms/batch 282.16 | loss 5.55 | ppl 257.34 | epoch 21 | 1200/ 1568 batches | lr 30.00 | ms/batch 293.66 | loss 5.51 | ppl 248.06 | epoch 21 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.96 | loss 5.51 | ppl 246.09 | epoch 21 | 1600/ 1568 batches | lr 30.00 | ms/batch 287.69 | loss 5.46 | ppl 234.31 ----------------------------------------------------------------------------------------- | end of epoch 21 | time: 490.76s | valid loss 5.48 | valid ppl 239.05 ----------------------------------------------------------------------------------------- | epoch 22 | 200/ 1568 batches | lr 30.00 | ms/batch 291.12 | loss 5.52 | ppl 248.49 | epoch 22 | 400/ 1568 batches | lr 30.00 | ms/batch 288.90 | loss 5.51 | ppl 247.87 | epoch 22 | 600/ 1568 batches | lr 30.00 | ms/batch 282.44 | loss 5.49 | ppl 243.00 | epoch 22 | 800/ 1568 batches | lr 30.00 | ms/batch 286.70 | loss 5.49 | ppl 242.59 | epoch 22 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.34 | loss 5.54 | ppl 254.24 | epoch 22 | 1200/ 1568 batches | lr 30.00 | ms/batch 289.25 | loss 5.48 | ppl 240.00 | epoch 22 | 1400/ 1568 batches | lr 30.00 | ms/batch 284.45 | loss 5.48 | ppl 239.76 | epoch 22 | 1600/ 1568 batches | lr 30.00 | ms/batch 292.40 | loss 5.44 | ppl 230.05 ----------------------------------------------------------------------------------------- | end of epoch 22 | time: 491.93s | valid loss 5.40 | valid ppl 221.28 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 23 | 200/ 1568 batches | lr 30.00 | ms/batch 286.45 | loss 5.49 | ppl 241.10 | epoch 23 | 400/ 1568 batches | lr 30.00 | ms/batch 290.94 | loss 5.50 | ppl 245.89 | epoch 23 | 600/ 1568 batches | lr 30.00 | ms/batch 286.28 | loss 5.47 | ppl 237.10 | epoch 23 | 800/ 1568 batches | lr 30.00 | ms/batch 286.10 | loss 5.45 | ppl 232.87 | epoch 23 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.46 | loss 5.51 | ppl 245.97 | epoch 23 | 1200/ 1568 batches | lr 30.00 | ms/batch 290.31 | loss 5.46 | ppl 234.87 | epoch 23 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.39 | loss 5.45 | ppl 232.77 | epoch 23 | 1600/ 1568 batches | lr 30.00 | ms/batch 289.37 | loss 5.40 | ppl 221.03 ----------------------------------------------------------------------------------------- | end of epoch 23 | time: 491.27s | valid loss 5.39 | valid ppl 218.76 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 24 | 200/ 1568 batches | lr 30.00 | ms/batch 289.15 | loss 5.45 | ppl 232.86 | epoch 24 | 400/ 1568 batches | lr 30.00 | ms/batch 282.27 | loss 5.45 | ppl 232.00 | epoch 24 | 600/ 1568 batches | lr 30.00 | ms/batch 292.86 | loss 5.44 | ppl 230.64 | epoch 24 | 800/ 1568 batches | lr 30.00 | ms/batch 286.12 | loss 5.43 | ppl 227.67 | epoch 24 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.26 | loss 5.47 | ppl 237.99 | epoch 24 | 1200/ 1568 batches | lr 30.00 | ms/batch 286.76 | loss 5.43 | ppl 229.22 | epoch 24 | 1400/ 1568 batches | lr 30.00 | ms/batch 289.84 | loss 5.41 | ppl 224.23 | epoch 24 | 1600/ 1568 batches | lr 30.00 | ms/batch 284.51 | loss 5.38 | ppl 217.60 ----------------------------------------------------------------------------------------- | end of epoch 24 | time: 491.71s | valid loss 5.36 | valid ppl 213.49 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 25 | 200/ 1568 batches | lr 30.00 | ms/batch 290.62 | loss 5.44 | ppl 229.76 | epoch 25 | 400/ 1568 batches | lr 30.00 | ms/batch 286.43 | loss 5.44 | ppl 231.11 | epoch 25 | 600/ 1568 batches | lr 30.00 | ms/batch 282.06 | loss 5.42 | ppl 224.80 | epoch 25 | 800/ 1568 batches | lr 30.00 | ms/batch 286.64 | loss 5.39 | ppl 219.96 | epoch 25 | 1000/ 1568 batches | lr 30.00 | ms/batch 292.59 | loss 5.47 | ppl 236.83 | epoch 25 | 1200/ 1568 batches | lr 30.00 | ms/batch 284.43 | loss 5.40 | ppl 222.32 | epoch 25 | 1400/ 1568 batches | lr 30.00 | ms/batch 284.74 | loss 5.39 | ppl 218.48 | epoch 25 | 1600/ 1568 batches | lr 30.00 | ms/batch 286.86 | loss 5.36 | ppl 212.70 ----------------------------------------------------------------------------------------- | end of epoch 25 | time: 492.81s | valid loss 5.34 | valid ppl 207.61 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 26 | 200/ 1568 batches | lr 30.00 | ms/batch 289.68 | loss 5.41 | ppl 224.43 | epoch 26 | 400/ 1568 batches | lr 30.00 | ms/batch 291.74 | loss 5.41 | ppl 223.05 | epoch 26 | 600/ 1568 batches | lr 30.00 | ms/batch 288.43 | loss 5.40 | ppl 221.24 | epoch 26 | 800/ 1568 batches | lr 30.00 | ms/batch 286.92 | loss 5.38 | ppl 217.67 | epoch 26 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.64 | loss 5.42 | ppl 226.44 | epoch 26 | 1200/ 1568 batches | lr 30.00 | ms/batch 283.47 | loss 5.38 | ppl 217.00 | epoch 26 | 1400/ 1568 batches | lr 30.00 | ms/batch 285.87 | loss 5.35 | ppl 211.39 | epoch 26 | 1600/ 1568 batches | lr 30.00 | ms/batch 287.72 | loss 5.33 | ppl 206.09 ----------------------------------------------------------------------------------------- | end of epoch 26 | time: 492.25s | valid loss 5.37 | valid ppl 213.98 ----------------------------------------------------------------------------------------- | epoch 27 | 200/ 1568 batches | lr 30.00 | ms/batch 287.83 | loss 5.38 | ppl 216.02 | epoch 27 | 400/ 1568 batches | lr 30.00 | ms/batch 285.10 | loss 5.37 | ppl 215.46 | epoch 27 | 600/ 1568 batches | lr 30.00 | ms/batch 290.50 | loss 5.38 | ppl 216.20 | epoch 27 | 800/ 1568 batches | lr 30.00 | ms/batch 285.67 | loss 5.35 | ppl 211.28 | epoch 27 | 1000/ 1568 batches | lr 30.00 | ms/batch 287.15 | loss 5.38 | ppl 218.09 | epoch 27 | 1200/ 1568 batches | lr 30.00 | ms/batch 286.36 | loss 5.35 | ppl 210.07 | epoch 27 | 1400/ 1568 batches | lr 30.00 | ms/batch 290.05 | loss 5.34 | ppl 207.48 | epoch 27 | 1600/ 1568 batches | lr 30.00 | ms/batch 283.03 | loss 5.28 | ppl 197.13 ----------------------------------------------------------------------------------------- | end of epoch 27 | time: 491.08s | valid loss 5.30 | valid ppl 200.37 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 28 | 200/ 1568 batches | lr 30.00 | ms/batch 286.01 | loss 5.34 | ppl 209.31 | epoch 28 | 400/ 1568 batches | lr 30.00 | ms/batch 287.06 | loss 5.34 | ppl 208.87 | epoch 28 | 600/ 1568 batches | lr 30.00 | ms/batch 286.42 | loss 5.34 | ppl 207.52 | epoch 28 | 800/ 1568 batches | lr 30.00 | ms/batch 285.20 | loss 5.33 | ppl 207.20 | epoch 28 | 1000/ 1568 batches | lr 30.00 | ms/batch 285.03 | loss 5.37 | ppl 215.19 | epoch 28 | 1200/ 1568 batches | lr 30.00 | ms/batch 288.69 | loss 5.32 | ppl 204.78 | epoch 28 | 1400/ 1568 batches | lr 30.00 | ms/batch 287.23 | loss 5.31 | ppl 202.09 | epoch 28 | 1600/ 1568 batches | lr 30.00 | ms/batch 286.24 | loss 5.27 | ppl 194.33 ----------------------------------------------------------------------------------------- | end of epoch 28 | time: 491.34s | valid loss 5.33 | valid ppl 207.32 ----------------------------------------------------------------------------------------- | epoch 29 | 200/ 1568 batches | lr 30.00 | ms/batch 289.89 | loss 5.33 | ppl 206.88 | epoch 29 | 400/ 1568 batches | lr 30.00 | ms/batch 285.32 | loss 5.34 | ppl 207.98 | epoch 29 | 600/ 1568 batches | lr 30.00 | ms/batch 288.02 | loss 5.30 | ppl 199.80 | epoch 29 | 800/ 1568 batches | lr 30.00 | ms/batch 284.10 | loss 5.30 | ppl 199.69 | epoch 29 | 1000/ 1568 batches | lr 30.00 | ms/batch 289.12 | loss 5.35 | ppl 211.00 | epoch 29 | 1200/ 1568 batches | lr 30.00 | ms/batch 282.73 | loss 5.29 | ppl 197.40 | epoch 29 | 1400/ 1568 batches | lr 30.00 | ms/batch 285.67 | loss 5.30 | ppl 199.79 | epoch 29 | 1600/ 1568 batches | lr 30.00 | ms/batch 294.46 | loss 5.25 | ppl 191.49 ----------------------------------------------------------------------------------------- | end of epoch 29 | time: 491.75s | valid loss 5.28 | valid ppl 195.60 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 30 | 200/ 1568 batches | lr 30.00 | ms/batch 294.11 | loss 5.32 | ppl 204.83 | epoch 30 | 400/ 1568 batches | lr 30.00 | ms/batch 287.55 | loss 5.31 | ppl 202.49 | epoch 30 | 600/ 1568 batches | lr 30.00 | ms/batch 288.05 | loss 5.29 | ppl 197.83 | epoch 30 | 800/ 1568 batches | lr 30.00 | ms/batch 289.20 | loss 5.29 | ppl 197.63 | epoch 30 | 1000/ 1568 batches | lr 30.00 | ms/batch 286.03 | loss 5.32 | ppl 203.58 | epoch 30 | 1200/ 1568 batches | lr 30.00 | ms/batch 286.54 | loss 5.29 | ppl 198.43 | epoch 30 | 1400/ 1568 batches | lr 30.00 | ms/batch 286.75 | loss 5.28 | ppl 195.71 | epoch 30 | 1600/ 1568 batches | lr 30.00 | ms/batch 286.27 | loss 5.21 | ppl 183.82 ----------------------------------------------------------------------------------------- | end of epoch 30 | time: 490.79s | valid loss 5.26 | valid ppl 192.75 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 31 | 200/ 1568 batches | lr 30.00 | ms/batch 291.37 | loss 5.28 | ppl 197.19 | epoch 31 | 400/ 1568 batches | lr 30.00 | ms/batch 289.67 | loss 5.29 | ppl 199.21 | epoch 31 | 600/ 1568 batches | lr 30.00 | ms/batch 281.60 | loss 5.25 | ppl 191.05 | epoch 31 | 800/ 1568 batches | lr 30.00 | ms/batch 288.12 | loss 5.25 | ppl 190.76 | epoch 31 | 1000/ 1568 batches | lr 30.00 | ms/batch 290.11 | loss 5.29 | ppl 199.16 | epoch 31 | 1200/ 1568 batches | lr 30.00 | ms/batch 287.65 | loss 5.25 | ppl 191.24 | epoch 31 | 1400/ 1568 batches | lr 30.00 | ms/batch 289.38 | loss 5.23 | ppl 187.11 | epoch 31 | 1600/ 1568 batches | lr 30.00 | ms/batch 291.60 | loss 5.20 | ppl 181.38 ----------------------------------------------------------------------------------------- | end of epoch 31 | time: 491.68s | valid loss 5.23 | valid ppl 186.42 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 32 | 200/ 1568 batches | lr 30.00 | ms/batch 290.44 | loss 5.26 | ppl 191.65 | epoch 32 | 400/ 1568 batches | lr 30.00 | ms/batch 289.42 | loss 5.25 | ppl 190.17 | epoch 32 | 600/ 1568 batches | lr 30.00 | ms/batch 287.73 | loss 5.25 | ppl 190.73 | epoch 32 | 800/ 1568 batches | lr 30.00 | ms/batch 287.14 | loss 5.24 | ppl 187.90 | epoch 32 | 1000/ 1568 batches | lr 30.00 | ms/batch 281.27 | loss 5.28 | ppl 196.79 | epoch 32 | 1200/ 1568 batches | lr 30.00 | ms/batch 291.18 | loss 5.23 | ppl 186.38 | epoch 32 | 1400/ 1568 batches | lr 30.00 | ms/batch 286.71 | loss 5.22 | ppl 185.30 | epoch 32 | 1600/ 1568 batches | lr 30.00 | ms/batch 291.80 | loss 5.18 | ppl 177.82 ----------------------------------------------------------------------------------------- | end of epoch 32 | time: 492.09s | valid loss 5.22 | valid ppl 184.20 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 33 | 200/ 1568 batches | lr 30.00 | ms/batch 286.83 | loss 5.24 | ppl 188.17 | epoch 33 | 400/ 1568 batches | lr 30.00 | ms/batch 286.76 | loss 5.24 | ppl 187.73 | epoch 33 | 600/ 1568 batches | lr 30.00 | ms/batch 281.31 | loss 5.20 | ppl 181.90 | epoch 33 | 800/ 1568 batches | lr 30.00 | ms/batch 284.26 | loss 5.20 | ppl 181.01 | epoch 33 | 1000/ 1568 batches | lr 30.00 | ms/batch 286.97 | loss 5.27 | ppl 194.13 | epoch 33 | 1200/ 1568 batches | lr 30.00 | ms/batch 286.45 | loss 5.19 | ppl 180.25 | epoch 33 | 1400/ 1568 batches | lr 30.00 | ms/batch 288.64 | loss 5.20 | ppl 181.07 | epoch 33 | 1600/ 1568 batches | lr 30.00 | ms/batch 288.73 | loss 5.18 | ppl 178.06 ----------------------------------------------------------------------------------------- | end of epoch 33 | time: 492.05s | valid loss 5.20 | valid ppl 181.17 ----------------------------------------------------------------------------------------- Saving Normal! | epoch 34 | 200/ 1568 batches | lr 30.00 | ms/batch 287.44 | loss 5.22 | ppl 184.23 | epoch 34 | 400/ 1568 batches | lr 30.00 | ms/batch 285.28 | loss 5.22 | ppl 185.39 | epoch 34 | 600/ 1568 batches | lr 30.00 | ms/batch 284.44 | loss 5.19 | ppl 178.99 | epoch 34 | 800/ 1568 batches | lr 30.00 | ms/batch 288.65 | loss 5.18 | ppl 177.88 | epoch 34 | 1000/ 1568 batches | lr 30.00 | ms/batch 288.90 | loss 5.25 | ppl 189.63 | epoch 34 | 1200/ 1568 batches | lr 30.00 | ms/batch 289.98 | loss 5.17 | ppl 175.43 | epoch 34 | 1400/ 1568 batches | lr 30.00 | ms/batch 290.15 | loss 5.18 | ppl 177.07 | epoch 34 | 1600/ 1568 batches | lr 30.00 | ms/batch 292.38 | loss 5.14 | ppl 170.45 ----------------------------------------------------------------------------------------- | end of epoch 34 | time: 490.70s | valid loss 5.35 | valid ppl 210.61 ----------------------------------------------------------------------------------------- Switching! | epoch 35 | 200/ 1568 batches | lr 30.00 | ms/batch 298.59 | loss 5.20 | ppl 180.72 | epoch 35 | 400/ 1568 batches | lr 30.00 | ms/batch 297.71 | loss 5.19 | ppl 179.77 | epoch 35 | 600/ 1568 batches | lr 30.00 | ms/batch 299.52 | loss 5.18 | ppl 178.20 | epoch 35 | 800/ 1568 batches | lr 30.00 | ms/batch 295.21 | loss 5.16 | ppl 173.51 | epoch 35 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.65 | loss 5.23 | ppl 186.56 | epoch 35 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.94 | loss 5.15 | ppl 172.08 | epoch 35 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.85 | loss 5.15 | ppl 171.96 | epoch 35 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.88 | loss 5.12 | ppl 166.93 ----------------------------------------------------------------------------------------- | end of epoch 35 | time: 510.61s | valid loss 5.07 | valid ppl 159.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 36 | 200/ 1568 batches | lr 30.00 | ms/batch 300.58 | loss 5.19 | ppl 179.71 | epoch 36 | 400/ 1568 batches | lr 30.00 | ms/batch 298.83 | loss 5.17 | ppl 176.79 | epoch 36 | 600/ 1568 batches | lr 30.00 | ms/batch 296.45 | loss 5.15 | ppl 173.19 | epoch 36 | 800/ 1568 batches | lr 30.00 | ms/batch 303.45 | loss 5.14 | ppl 171.12 | epoch 36 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.15 | loss 5.19 | ppl 179.34 | epoch 36 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.11 | loss 5.15 | ppl 171.76 | epoch 36 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 5.14 | ppl 170.30 | epoch 36 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.05 | loss 5.11 | ppl 165.60 ----------------------------------------------------------------------------------------- | end of epoch 36 | time: 509.56s | valid loss 5.06 | valid ppl 157.04 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 37 | 200/ 1568 batches | lr 30.00 | ms/batch 298.46 | loss 5.16 | ppl 174.75 | epoch 37 | 400/ 1568 batches | lr 30.00 | ms/batch 297.60 | loss 5.14 | ppl 171.22 | epoch 37 | 600/ 1568 batches | lr 30.00 | ms/batch 299.96 | loss 5.13 | ppl 168.74 | epoch 37 | 800/ 1568 batches | lr 30.00 | ms/batch 288.76 | loss 5.11 | ppl 166.37 | epoch 37 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.62 | loss 5.18 | ppl 177.16 | epoch 37 | 1200/ 1568 batches | lr 30.00 | ms/batch 294.17 | loss 5.10 | ppl 164.49 | epoch 37 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.16 | loss 5.13 | ppl 169.04 | epoch 37 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.76 | loss 5.09 | ppl 162.53 ----------------------------------------------------------------------------------------- | end of epoch 37 | time: 509.70s | valid loss 5.04 | valid ppl 155.05 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 38 | 200/ 1568 batches | lr 30.00 | ms/batch 299.11 | loss 5.15 | ppl 173.29 | epoch 38 | 400/ 1568 batches | lr 30.00 | ms/batch 294.70 | loss 5.13 | ppl 169.64 | epoch 38 | 600/ 1568 batches | lr 30.00 | ms/batch 299.35 | loss 5.13 | ppl 169.50 | epoch 38 | 800/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 5.12 | ppl 167.56 | epoch 38 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.54 | loss 5.16 | ppl 173.90 | epoch 38 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.77 | loss 5.10 | ppl 164.14 | epoch 38 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 5.11 | ppl 164.87 | epoch 38 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.70 | loss 5.07 | ppl 158.42 ----------------------------------------------------------------------------------------- | end of epoch 38 | time: 510.61s | valid loss 5.03 | valid ppl 153.45 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 39 | 200/ 1568 batches | lr 30.00 | ms/batch 303.99 | loss 5.14 | ppl 170.47 | epoch 39 | 400/ 1568 batches | lr 30.00 | ms/batch 300.12 | loss 5.12 | ppl 167.84 | epoch 39 | 600/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 5.11 | ppl 164.97 | epoch 39 | 800/ 1568 batches | lr 30.00 | ms/batch 303.39 | loss 5.12 | ppl 166.90 | epoch 39 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.37 | loss 5.13 | ppl 168.61 | epoch 39 | 1200/ 1568 batches | lr 30.00 | ms/batch 294.73 | loss 5.08 | ppl 161.52 | epoch 39 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 5.09 | ppl 162.29 | epoch 39 | 1600/ 1568 batches | lr 30.00 | ms/batch 294.69 | loss 5.04 | ppl 154.71 ----------------------------------------------------------------------------------------- | end of epoch 39 | time: 510.36s | valid loss 5.02 | valid ppl 152.00 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 40 | 200/ 1568 batches | lr 30.00 | ms/batch 297.36 | loss 5.11 | ppl 164.90 | epoch 40 | 400/ 1568 batches | lr 30.00 | ms/batch 296.25 | loss 5.10 | ppl 164.25 | epoch 40 | 600/ 1568 batches | lr 30.00 | ms/batch 297.14 | loss 5.09 | ppl 161.87 | epoch 40 | 800/ 1568 batches | lr 30.00 | ms/batch 296.99 | loss 5.07 | ppl 159.73 | epoch 40 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.68 | loss 5.13 | ppl 168.65 | epoch 40 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.57 | loss 5.06 | ppl 156.98 | epoch 40 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.65 | loss 5.08 | ppl 160.52 | epoch 40 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.52 | loss 5.04 | ppl 154.45 ----------------------------------------------------------------------------------------- | end of epoch 40 | time: 510.16s | valid loss 5.02 | valid ppl 150.74 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 41 | 200/ 1568 batches | lr 30.00 | ms/batch 301.90 | loss 5.11 | ppl 165.52 | epoch 41 | 400/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 5.11 | ppl 164.86 | epoch 41 | 600/ 1568 batches | lr 30.00 | ms/batch 296.58 | loss 5.07 | ppl 158.57 | epoch 41 | 800/ 1568 batches | lr 30.00 | ms/batch 292.80 | loss 5.04 | ppl 155.05 | epoch 41 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.95 | loss 5.11 | ppl 166.03 | epoch 41 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.62 | loss 5.05 | ppl 156.32 | epoch 41 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 5.05 | ppl 156.02 | epoch 41 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.65 | loss 5.03 | ppl 152.44 ----------------------------------------------------------------------------------------- | end of epoch 41 | time: 509.73s | valid loss 5.01 | valid ppl 149.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 42 | 200/ 1568 batches | lr 30.00 | ms/batch 301.85 | loss 5.08 | ppl 160.32 | epoch 42 | 400/ 1568 batches | lr 30.00 | ms/batch 298.96 | loss 5.08 | ppl 161.25 | epoch 42 | 600/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 5.06 | ppl 157.48 | epoch 42 | 800/ 1568 batches | lr 30.00 | ms/batch 298.56 | loss 5.06 | ppl 157.05 | epoch 42 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.75 | loss 5.10 | ppl 163.33 | epoch 42 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.48 | loss 5.04 | ppl 155.22 | epoch 42 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.85 | loss 5.03 | ppl 153.26 | epoch 42 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.60 | loss 5.00 | ppl 148.93 ----------------------------------------------------------------------------------------- | end of epoch 42 | time: 510.17s | valid loss 5.00 | valid ppl 148.59 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 43 | 200/ 1568 batches | lr 30.00 | ms/batch 299.27 | loss 5.06 | ppl 157.67 | epoch 43 | 400/ 1568 batches | lr 30.00 | ms/batch 295.82 | loss 5.07 | ppl 158.82 | epoch 43 | 600/ 1568 batches | lr 30.00 | ms/batch 296.63 | loss 5.04 | ppl 154.71 | epoch 43 | 800/ 1568 batches | lr 30.00 | ms/batch 296.02 | loss 5.03 | ppl 152.95 | epoch 43 | 1000/ 1568 batches | lr 30.00 | ms/batch 293.83 | loss 5.09 | ppl 161.80 | epoch 43 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.32 | loss 5.02 | ppl 151.41 | epoch 43 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.03 | loss 5.02 | ppl 151.10 | epoch 43 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.31 | loss 5.01 | ppl 149.69 ----------------------------------------------------------------------------------------- | end of epoch 43 | time: 509.71s | valid loss 4.99 | valid ppl 147.63 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 44 | 200/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 5.05 | ppl 155.26 | epoch 44 | 400/ 1568 batches | lr 30.00 | ms/batch 297.63 | loss 5.04 | ppl 154.65 | epoch 44 | 600/ 1568 batches | lr 30.00 | ms/batch 296.35 | loss 5.03 | ppl 153.52 | epoch 44 | 800/ 1568 batches | lr 30.00 | ms/batch 299.04 | loss 5.00 | ppl 148.65 | epoch 44 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.95 | loss 5.06 | ppl 158.22 | epoch 44 | 1200/ 1568 batches | lr 30.00 | ms/batch 293.56 | loss 5.01 | ppl 149.70 | epoch 44 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.47 | loss 5.00 | ppl 148.66 | epoch 44 | 1600/ 1568 batches | lr 30.00 | ms/batch 293.75 | loss 4.99 | ppl 147.11 ----------------------------------------------------------------------------------------- | end of epoch 44 | time: 509.35s | valid loss 4.99 | valid ppl 146.72 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 45 | 200/ 1568 batches | lr 30.00 | ms/batch 301.84 | loss 5.04 | ppl 155.03 | epoch 45 | 400/ 1568 batches | lr 30.00 | ms/batch 296.42 | loss 5.04 | ppl 154.60 | epoch 45 | 600/ 1568 batches | lr 30.00 | ms/batch 294.13 | loss 5.01 | ppl 149.36 | epoch 45 | 800/ 1568 batches | lr 30.00 | ms/batch 299.34 | loss 5.00 | ppl 149.08 | epoch 45 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.31 | loss 5.05 | ppl 155.52 | epoch 45 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.99 | ppl 146.66 | epoch 45 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.95 | loss 4.98 | ppl 146.01 | epoch 45 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.62 | loss 4.95 | ppl 141.80 ----------------------------------------------------------------------------------------- | end of epoch 45 | time: 509.55s | valid loss 4.98 | valid ppl 145.85 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 46 | 200/ 1568 batches | lr 30.00 | ms/batch 300.82 | loss 5.02 | ppl 151.75 | epoch 46 | 400/ 1568 batches | lr 30.00 | ms/batch 297.12 | loss 5.01 | ppl 149.24 | epoch 46 | 600/ 1568 batches | lr 30.00 | ms/batch 296.13 | loss 5.00 | ppl 149.15 | epoch 46 | 800/ 1568 batches | lr 30.00 | ms/batch 302.59 | loss 4.99 | ppl 147.67 | epoch 46 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.77 | loss 5.02 | ppl 151.12 | epoch 46 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.58 | loss 4.96 | ppl 142.48 | epoch 46 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.86 | loss 4.96 | ppl 143.23 | epoch 46 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.01 | loss 4.94 | ppl 140.29 ----------------------------------------------------------------------------------------- | end of epoch 46 | time: 508.89s | valid loss 4.98 | valid ppl 145.03 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 47 | 200/ 1568 batches | lr 30.00 | ms/batch 300.84 | loss 5.01 | ppl 150.49 | epoch 47 | 400/ 1568 batches | lr 30.00 | ms/batch 294.83 | loss 5.01 | ppl 149.55 | epoch 47 | 600/ 1568 batches | lr 30.00 | ms/batch 297.42 | loss 4.97 | ppl 144.57 | epoch 47 | 800/ 1568 batches | lr 30.00 | ms/batch 301.91 | loss 4.96 | ppl 142.95 | epoch 47 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.61 | loss 5.02 | ppl 151.04 | epoch 47 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.58 | loss 4.97 | ppl 144.51 | epoch 47 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.95 | loss 4.97 | ppl 144.61 | epoch 47 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.12 | loss 4.92 | ppl 137.31 ----------------------------------------------------------------------------------------- | end of epoch 47 | time: 510.04s | valid loss 4.97 | valid ppl 144.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 48 | 200/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.99 | ppl 147.45 | epoch 48 | 400/ 1568 batches | lr 30.00 | ms/batch 299.08 | loss 5.00 | ppl 148.19 | epoch 48 | 600/ 1568 batches | lr 30.00 | ms/batch 301.29 | loss 4.96 | ppl 143.19 | epoch 48 | 800/ 1568 batches | lr 30.00 | ms/batch 300.24 | loss 4.96 | ppl 142.72 | epoch 48 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.73 | loss 5.00 | ppl 148.24 | epoch 48 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.61 | loss 4.96 | ppl 142.10 | epoch 48 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.63 | loss 4.96 | ppl 142.66 | epoch 48 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.56 | loss 4.92 | ppl 137.07 ----------------------------------------------------------------------------------------- | end of epoch 48 | time: 508.82s | valid loss 4.97 | valid ppl 143.52 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 49 | 200/ 1568 batches | lr 30.00 | ms/batch 297.75 | loss 4.98 | ppl 145.30 | epoch 49 | 400/ 1568 batches | lr 30.00 | ms/batch 296.57 | loss 4.97 | ppl 143.84 | epoch 49 | 600/ 1568 batches | lr 30.00 | ms/batch 298.64 | loss 4.96 | ppl 142.85 | epoch 49 | 800/ 1568 batches | lr 30.00 | ms/batch 299.41 | loss 4.94 | ppl 139.53 | epoch 49 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.93 | loss 4.99 | ppl 146.99 | epoch 49 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.06 | loss 4.93 | ppl 138.20 | epoch 49 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.95 | loss 4.93 | ppl 139.03 | epoch 49 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.92 | ppl 137.09 ----------------------------------------------------------------------------------------- | end of epoch 49 | time: 509.12s | valid loss 4.96 | valid ppl 142.81 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 50 | 200/ 1568 batches | lr 30.00 | ms/batch 299.30 | loss 4.98 | ppl 144.76 | epoch 50 | 400/ 1568 batches | lr 30.00 | ms/batch 298.19 | loss 4.95 | ppl 140.81 | epoch 50 | 600/ 1568 batches | lr 30.00 | ms/batch 299.23 | loss 4.93 | ppl 138.29 | epoch 50 | 800/ 1568 batches | lr 30.00 | ms/batch 297.43 | loss 4.94 | ppl 140.03 | epoch 50 | 1000/ 1568 batches | lr 30.00 | ms/batch 295.81 | loss 4.98 | ppl 144.89 | epoch 50 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.89 | loss 4.91 | ppl 135.67 | epoch 50 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.28 | loss 4.91 | ppl 135.25 | epoch 50 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.07 | loss 4.91 | ppl 136.19 ----------------------------------------------------------------------------------------- | end of epoch 50 | time: 509.02s | valid loss 4.96 | valid ppl 142.12 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 51 | 200/ 1568 batches | lr 30.00 | ms/batch 303.01 | loss 4.95 | ppl 141.18 | epoch 51 | 400/ 1568 batches | lr 30.00 | ms/batch 294.96 | loss 4.95 | ppl 140.47 | epoch 51 | 600/ 1568 batches | lr 30.00 | ms/batch 293.46 | loss 4.94 | ppl 139.50 | epoch 51 | 800/ 1568 batches | lr 30.00 | ms/batch 297.87 | loss 4.92 | ppl 137.25 | epoch 51 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.23 | loss 4.96 | ppl 142.98 | epoch 51 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.97 | loss 4.93 | ppl 138.33 | epoch 51 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.97 | loss 4.91 | ppl 135.95 | epoch 51 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.25 | loss 4.87 | ppl 129.80 ----------------------------------------------------------------------------------------- | end of epoch 51 | time: 509.42s | valid loss 4.95 | valid ppl 141.47 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 52 | 200/ 1568 batches | lr 30.00 | ms/batch 298.81 | loss 4.93 | ppl 138.40 | epoch 52 | 400/ 1568 batches | lr 30.00 | ms/batch 296.40 | loss 4.94 | ppl 139.15 | epoch 52 | 600/ 1568 batches | lr 30.00 | ms/batch 302.48 | loss 4.94 | ppl 139.80 | epoch 52 | 800/ 1568 batches | lr 30.00 | ms/batch 294.21 | loss 4.90 | ppl 134.61 | epoch 52 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.98 | loss 4.95 | ppl 140.90 | epoch 52 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.60 | loss 4.90 | ppl 134.37 | epoch 52 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.32 | loss 4.90 | ppl 134.14 | epoch 52 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.33 | loss 4.88 | ppl 132.19 ----------------------------------------------------------------------------------------- | end of epoch 52 | time: 509.87s | valid loss 4.95 | valid ppl 140.88 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 53 | 200/ 1568 batches | lr 30.00 | ms/batch 295.83 | loss 4.93 | ppl 138.04 | epoch 53 | 400/ 1568 batches | lr 30.00 | ms/batch 298.63 | loss 4.93 | ppl 138.07 | epoch 53 | 600/ 1568 batches | lr 30.00 | ms/batch 300.00 | loss 4.89 | ppl 132.84 | epoch 53 | 800/ 1568 batches | lr 30.00 | ms/batch 301.13 | loss 4.90 | ppl 134.71 | epoch 53 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.94 | ppl 140.39 | epoch 53 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.14 | loss 4.89 | ppl 132.93 | epoch 53 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.02 | loss 4.89 | ppl 132.50 | epoch 53 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.87 | ppl 130.67 ----------------------------------------------------------------------------------------- | end of epoch 53 | time: 512.01s | valid loss 4.94 | valid ppl 140.31 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 54 | 200/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.91 | ppl 135.82 | epoch 54 | 400/ 1568 batches | lr 30.00 | ms/batch 301.02 | loss 4.91 | ppl 136.09 | epoch 54 | 600/ 1568 batches | lr 30.00 | ms/batch 299.44 | loss 4.89 | ppl 132.34 | epoch 54 | 800/ 1568 batches | lr 30.00 | ms/batch 301.51 | loss 4.90 | ppl 133.85 | epoch 54 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.83 | loss 4.93 | ppl 137.90 | epoch 54 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.43 | loss 4.89 | ppl 133.37 | epoch 54 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.40 | loss 4.88 | ppl 131.67 | epoch 54 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.45 | loss 4.83 | ppl 125.76 ----------------------------------------------------------------------------------------- | end of epoch 54 | time: 511.55s | valid loss 4.94 | valid ppl 139.76 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 55 | 200/ 1568 batches | lr 30.00 | ms/batch 305.96 | loss 4.92 | ppl 137.01 | epoch 55 | 400/ 1568 batches | lr 30.00 | ms/batch 295.27 | loss 4.88 | ppl 132.16 | epoch 55 | 600/ 1568 batches | lr 30.00 | ms/batch 299.81 | loss 4.88 | ppl 132.23 | epoch 55 | 800/ 1568 batches | lr 30.00 | ms/batch 300.91 | loss 4.87 | ppl 130.72 | epoch 55 | 1000/ 1568 batches | lr 30.00 | ms/batch 295.53 | loss 4.90 | ppl 134.66 | epoch 55 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.06 | loss 4.87 | ppl 129.72 | epoch 55 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.89 | loss 4.86 | ppl 129.32 | epoch 55 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.02 | loss 4.84 | ppl 125.99 ----------------------------------------------------------------------------------------- | end of epoch 55 | time: 511.56s | valid loss 4.94 | valid ppl 139.21 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 56 | 200/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.90 | ppl 133.62 | epoch 56 | 400/ 1568 batches | lr 30.00 | ms/batch 302.23 | loss 4.89 | ppl 133.45 | epoch 56 | 600/ 1568 batches | lr 30.00 | ms/batch 299.67 | loss 4.85 | ppl 128.36 | epoch 56 | 800/ 1568 batches | lr 30.00 | ms/batch 303.75 | loss 4.87 | ppl 130.53 | epoch 56 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.88 | ppl 132.02 | epoch 56 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.85 | ppl 127.74 | epoch 56 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.86 | ppl 129.36 | epoch 56 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.60 | loss 4.83 | ppl 125.43 ----------------------------------------------------------------------------------------- | end of epoch 56 | time: 512.23s | valid loss 4.93 | valid ppl 138.68 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 57 | 200/ 1568 batches | lr 30.00 | ms/batch 304.99 | loss 4.89 | ppl 133.40 | epoch 57 | 400/ 1568 batches | lr 30.00 | ms/batch 302.94 | loss 4.87 | ppl 130.40 | epoch 57 | 600/ 1568 batches | lr 30.00 | ms/batch 300.66 | loss 4.85 | ppl 128.36 | epoch 57 | 800/ 1568 batches | lr 30.00 | ms/batch 302.46 | loss 4.86 | ppl 129.12 | epoch 57 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.38 | loss 4.90 | ppl 134.46 | epoch 57 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.40 | loss 4.87 | ppl 130.25 | epoch 57 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.58 | loss 4.85 | ppl 127.24 | epoch 57 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.80 | ppl 121.33 ----------------------------------------------------------------------------------------- | end of epoch 57 | time: 512.27s | valid loss 4.93 | valid ppl 138.17 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 58 | 200/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.88 | ppl 131.40 | epoch 58 | 400/ 1568 batches | lr 30.00 | ms/batch 301.10 | loss 4.87 | ppl 130.58 | epoch 58 | 600/ 1568 batches | lr 30.00 | ms/batch 302.41 | loss 4.86 | ppl 128.91 | epoch 58 | 800/ 1568 batches | lr 30.00 | ms/batch 299.21 | loss 4.85 | ppl 127.30 | epoch 58 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.45 | loss 4.87 | ppl 130.79 | epoch 58 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.79 | loss 4.86 | ppl 129.20 | epoch 58 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.75 | loss 4.85 | ppl 127.12 | epoch 58 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.23 | loss 4.82 | ppl 123.43 ----------------------------------------------------------------------------------------- | end of epoch 58 | time: 512.83s | valid loss 4.92 | valid ppl 137.67 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 59 | 200/ 1568 batches | lr 30.00 | ms/batch 304.83 | loss 4.86 | ppl 129.66 | epoch 59 | 400/ 1568 batches | lr 30.00 | ms/batch 295.39 | loss 4.86 | ppl 128.96 | epoch 59 | 600/ 1568 batches | lr 30.00 | ms/batch 296.35 | loss 4.84 | ppl 126.03 | epoch 59 | 800/ 1568 batches | lr 30.00 | ms/batch 304.53 | loss 4.83 | ppl 124.73 | epoch 59 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.44 | loss 4.87 | ppl 129.79 | epoch 59 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.37 | loss 4.84 | ppl 126.72 | epoch 59 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.10 | loss 4.83 | ppl 124.72 | epoch 59 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.26 | loss 4.80 | ppl 121.92 ----------------------------------------------------------------------------------------- | end of epoch 59 | time: 512.97s | valid loss 4.92 | valid ppl 137.20 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 60 | 200/ 1568 batches | lr 30.00 | ms/batch 306.49 | loss 4.86 | ppl 128.97 | epoch 60 | 400/ 1568 batches | lr 30.00 | ms/batch 303.06 | loss 4.87 | ppl 129.81 | epoch 60 | 600/ 1568 batches | lr 30.00 | ms/batch 300.48 | loss 4.83 | ppl 125.04 | epoch 60 | 800/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.83 | ppl 125.13 | epoch 60 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.27 | loss 4.87 | ppl 129.76 | epoch 60 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.25 | loss 4.82 | ppl 124.45 | epoch 60 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.97 | loss 4.81 | ppl 122.70 | epoch 60 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.96 | loss 4.78 | ppl 119.16 ----------------------------------------------------------------------------------------- | end of epoch 60 | time: 512.08s | valid loss 4.92 | valid ppl 136.74 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 61 | 200/ 1568 batches | lr 30.00 | ms/batch 299.30 | loss 4.85 | ppl 127.70 | epoch 61 | 400/ 1568 batches | lr 30.00 | ms/batch 299.42 | loss 4.84 | ppl 126.94 | epoch 61 | 600/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.84 | ppl 126.66 | epoch 61 | 800/ 1568 batches | lr 30.00 | ms/batch 305.05 | loss 4.82 | ppl 123.68 | epoch 61 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.92 | loss 4.84 | ppl 125.91 | epoch 61 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.42 | loss 4.81 | ppl 122.43 | epoch 61 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.68 | loss 4.80 | ppl 121.46 | epoch 61 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.09 | loss 4.79 | ppl 120.05 ----------------------------------------------------------------------------------------- | end of epoch 61 | time: 512.93s | valid loss 4.91 | valid ppl 136.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 62 | 200/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.83 | ppl 125.55 | epoch 62 | 400/ 1568 batches | lr 30.00 | ms/batch 303.22 | loss 4.83 | ppl 125.28 | epoch 62 | 600/ 1568 batches | lr 30.00 | ms/batch 303.17 | loss 4.80 | ppl 121.12 | epoch 62 | 800/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 4.81 | ppl 122.46 | epoch 62 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.63 | loss 4.84 | ppl 126.60 | epoch 62 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.39 | loss 4.81 | ppl 122.91 | epoch 62 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.61 | loss 4.80 | ppl 121.42 | epoch 62 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.60 | loss 4.77 | ppl 117.56 ----------------------------------------------------------------------------------------- | end of epoch 62 | time: 512.91s | valid loss 4.91 | valid ppl 135.83 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 63 | 200/ 1568 batches | lr 30.00 | ms/batch 303.92 | loss 4.82 | ppl 124.19 | epoch 63 | 400/ 1568 batches | lr 30.00 | ms/batch 300.07 | loss 4.82 | ppl 123.91 | epoch 63 | 600/ 1568 batches | lr 30.00 | ms/batch 306.02 | loss 4.79 | ppl 120.81 | epoch 63 | 800/ 1568 batches | lr 30.00 | ms/batch 300.52 | loss 4.79 | ppl 120.57 | epoch 63 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.31 | loss 4.83 | ppl 125.52 | epoch 63 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.98 | loss 4.80 | ppl 121.24 | epoch 63 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.71 | loss 4.77 | ppl 117.78 | epoch 63 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.33 | loss 4.76 | ppl 116.78 ----------------------------------------------------------------------------------------- | end of epoch 63 | time: 513.19s | valid loss 4.91 | valid ppl 135.40 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 64 | 200/ 1568 batches | lr 30.00 | ms/batch 302.61 | loss 4.83 | ppl 124.98 | epoch 64 | 400/ 1568 batches | lr 30.00 | ms/batch 300.22 | loss 4.80 | ppl 121.24 | epoch 64 | 600/ 1568 batches | lr 30.00 | ms/batch 302.11 | loss 4.79 | ppl 120.72 | epoch 64 | 800/ 1568 batches | lr 30.00 | ms/batch 297.65 | loss 4.78 | ppl 119.04 | epoch 64 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.38 | loss 4.83 | ppl 125.79 | epoch 64 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.78 | ppl 119.31 | epoch 64 | 1400/ 1568 batches | lr 30.00 | ms/batch 296.75 | loss 4.77 | ppl 117.98 | epoch 64 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.32 | loss 4.76 | ppl 116.99 ----------------------------------------------------------------------------------------- | end of epoch 64 | time: 513.42s | valid loss 4.91 | valid ppl 134.98 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 65 | 200/ 1568 batches | lr 30.00 | ms/batch 305.64 | loss 4.83 | ppl 124.84 | epoch 65 | 400/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.80 | ppl 121.94 | epoch 65 | 600/ 1568 batches | lr 30.00 | ms/batch 300.21 | loss 4.78 | ppl 119.14 | epoch 65 | 800/ 1568 batches | lr 30.00 | ms/batch 298.86 | loss 4.77 | ppl 118.35 | epoch 65 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.83 | ppl 124.90 | epoch 65 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.25 | loss 4.78 | ppl 119.03 | epoch 65 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.51 | loss 4.76 | ppl 117.31 | epoch 65 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.73 | ppl 113.43 ----------------------------------------------------------------------------------------- | end of epoch 65 | time: 513.89s | valid loss 4.90 | valid ppl 134.57 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 66 | 200/ 1568 batches | lr 30.00 | ms/batch 300.14 | loss 4.81 | ppl 123.13 | epoch 66 | 400/ 1568 batches | lr 30.00 | ms/batch 295.43 | loss 4.80 | ppl 121.87 | epoch 66 | 600/ 1568 batches | lr 30.00 | ms/batch 295.51 | loss 4.76 | ppl 117.00 | epoch 66 | 800/ 1568 batches | lr 30.00 | ms/batch 300.04 | loss 4.76 | ppl 116.33 | epoch 66 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.72 | loss 4.82 | ppl 123.97 | epoch 66 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.72 | loss 4.77 | ppl 117.34 | epoch 66 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.29 | loss 4.76 | ppl 116.24 | epoch 66 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.29 | loss 4.75 | ppl 116.14 ----------------------------------------------------------------------------------------- | end of epoch 66 | time: 513.72s | valid loss 4.90 | valid ppl 134.18 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 67 | 200/ 1568 batches | lr 30.00 | ms/batch 303.86 | loss 4.79 | ppl 120.22 | epoch 67 | 400/ 1568 batches | lr 30.00 | ms/batch 299.74 | loss 4.77 | ppl 118.47 | epoch 67 | 600/ 1568 batches | lr 30.00 | ms/batch 300.07 | loss 4.76 | ppl 117.14 | epoch 67 | 800/ 1568 batches | lr 30.00 | ms/batch 301.28 | loss 4.76 | ppl 116.91 | epoch 67 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.14 | loss 4.82 | ppl 123.69 | epoch 67 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.06 | loss 4.76 | ppl 116.91 | epoch 67 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.74 | ppl 114.77 | epoch 67 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.17 | loss 4.72 | ppl 112.55 ----------------------------------------------------------------------------------------- | end of epoch 67 | time: 514.19s | valid loss 4.90 | valid ppl 133.81 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 68 | 200/ 1568 batches | lr 30.00 | ms/batch 302.04 | loss 4.78 | ppl 119.12 | epoch 68 | 400/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.77 | ppl 117.87 | epoch 68 | 600/ 1568 batches | lr 30.00 | ms/batch 297.49 | loss 4.76 | ppl 116.82 | epoch 68 | 800/ 1568 batches | lr 30.00 | ms/batch 299.51 | loss 4.75 | ppl 115.30 | epoch 68 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.79 | ppl 120.65 | epoch 68 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.73 | loss 4.74 | ppl 114.97 | epoch 68 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.91 | loss 4.75 | ppl 116.06 | epoch 68 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.02 | loss 4.73 | ppl 113.10 ----------------------------------------------------------------------------------------- | end of epoch 68 | time: 513.68s | valid loss 4.89 | valid ppl 133.44 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 69 | 200/ 1568 batches | lr 30.00 | ms/batch 297.92 | loss 4.78 | ppl 119.55 | epoch 69 | 400/ 1568 batches | lr 30.00 | ms/batch 298.65 | loss 4.77 | ppl 117.54 | epoch 69 | 600/ 1568 batches | lr 30.00 | ms/batch 300.78 | loss 4.76 | ppl 117.14 | epoch 69 | 800/ 1568 batches | lr 30.00 | ms/batch 304.01 | loss 4.75 | ppl 115.23 | epoch 69 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.76 | loss 4.80 | ppl 120.97 | epoch 69 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.72 | ppl 112.62 | epoch 69 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.86 | loss 4.75 | ppl 115.25 | epoch 69 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.71 | loss 4.72 | ppl 111.62 ----------------------------------------------------------------------------------------- | end of epoch 69 | time: 513.78s | valid loss 4.89 | valid ppl 133.10 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 70 | 200/ 1568 batches | lr 30.00 | ms/batch 306.91 | loss 4.77 | ppl 117.54 | epoch 70 | 400/ 1568 batches | lr 30.00 | ms/batch 295.20 | loss 4.76 | ppl 116.94 | epoch 70 | 600/ 1568 batches | lr 30.00 | ms/batch 304.76 | loss 4.74 | ppl 114.96 | epoch 70 | 800/ 1568 batches | lr 30.00 | ms/batch 299.16 | loss 4.74 | ppl 113.87 | epoch 70 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.00 | loss 4.78 | ppl 119.08 | epoch 70 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.23 | loss 4.73 | ppl 113.39 | epoch 70 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.72 | loss 4.73 | ppl 113.24 | epoch 70 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.27 | loss 4.71 | ppl 111.25 ----------------------------------------------------------------------------------------- | end of epoch 70 | time: 513.23s | valid loss 4.89 | valid ppl 132.77 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 71 | 200/ 1568 batches | lr 30.00 | ms/batch 304.87 | loss 4.76 | ppl 116.87 | epoch 71 | 400/ 1568 batches | lr 30.00 | ms/batch 300.18 | loss 4.74 | ppl 114.88 | epoch 71 | 600/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.74 | ppl 114.25 | epoch 71 | 800/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.72 | ppl 112.35 | epoch 71 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.52 | loss 4.77 | ppl 118.15 | epoch 71 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.03 | loss 4.73 | ppl 112.74 | epoch 71 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.47 | loss 4.72 | ppl 112.63 | epoch 71 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.55 | loss 4.69 | ppl 109.19 ----------------------------------------------------------------------------------------- | end of epoch 71 | time: 513.71s | valid loss 4.89 | valid ppl 132.44 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 72 | 200/ 1568 batches | lr 30.00 | ms/batch 297.19 | loss 4.78 | ppl 118.77 | epoch 72 | 400/ 1568 batches | lr 30.00 | ms/batch 303.47 | loss 4.74 | ppl 114.57 | epoch 72 | 600/ 1568 batches | lr 30.00 | ms/batch 302.33 | loss 4.71 | ppl 110.93 | epoch 72 | 800/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 4.72 | ppl 111.61 | epoch 72 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.36 | loss 4.76 | ppl 117.24 | epoch 72 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.05 | loss 4.70 | ppl 110.50 | epoch 72 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.15 | loss 4.71 | ppl 111.55 | epoch 72 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.18 | loss 4.69 | ppl 109.11 ----------------------------------------------------------------------------------------- | end of epoch 72 | time: 513.81s | valid loss 4.88 | valid ppl 132.12 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 73 | 200/ 1568 batches | lr 30.00 | ms/batch 305.02 | loss 4.76 | ppl 117.17 | epoch 73 | 400/ 1568 batches | lr 30.00 | ms/batch 298.32 | loss 4.73 | ppl 113.82 | epoch 73 | 600/ 1568 batches | lr 30.00 | ms/batch 300.69 | loss 4.72 | ppl 111.90 | epoch 73 | 800/ 1568 batches | lr 30.00 | ms/batch 300.33 | loss 4.70 | ppl 110.23 | epoch 73 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.54 | loss 4.76 | ppl 116.59 | epoch 73 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.78 | loss 4.71 | ppl 110.93 | epoch 73 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.42 | loss 4.71 | ppl 110.56 | epoch 73 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.04 | loss 4.70 | ppl 109.62 ----------------------------------------------------------------------------------------- | end of epoch 73 | time: 513.45s | valid loss 4.88 | valid ppl 131.81 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 74 | 200/ 1568 batches | lr 30.00 | ms/batch 303.02 | loss 4.75 | ppl 115.96 | epoch 74 | 400/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.72 | ppl 112.62 | epoch 74 | 600/ 1568 batches | lr 30.00 | ms/batch 300.62 | loss 4.71 | ppl 110.90 | epoch 74 | 800/ 1568 batches | lr 30.00 | ms/batch 299.49 | loss 4.70 | ppl 109.85 | epoch 74 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.76 | ppl 116.45 | epoch 74 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.53 | loss 4.71 | ppl 111.17 | epoch 74 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.52 | loss 4.71 | ppl 110.60 | epoch 74 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.70 | loss 4.67 | ppl 106.61 ----------------------------------------------------------------------------------------- | end of epoch 74 | time: 513.49s | valid loss 4.88 | valid ppl 131.50 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 75 | 200/ 1568 batches | lr 30.00 | ms/batch 304.54 | loss 4.75 | ppl 115.80 | epoch 75 | 400/ 1568 batches | lr 30.00 | ms/batch 300.05 | loss 4.70 | ppl 110.14 | epoch 75 | 600/ 1568 batches | lr 30.00 | ms/batch 304.94 | loss 4.70 | ppl 109.80 | epoch 75 | 800/ 1568 batches | lr 30.00 | ms/batch 303.75 | loss 4.70 | ppl 110.04 | epoch 75 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.84 | loss 4.76 | ppl 116.86 | epoch 75 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.92 | loss 4.70 | ppl 110.38 | epoch 75 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.68 | ppl 108.25 | epoch 75 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.06 | loss 4.66 | ppl 105.32 ----------------------------------------------------------------------------------------- | end of epoch 75 | time: 513.59s | valid loss 4.88 | valid ppl 131.20 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 76 | 200/ 1568 batches | lr 30.00 | ms/batch 301.48 | loss 4.74 | ppl 114.44 | epoch 76 | 400/ 1568 batches | lr 30.00 | ms/batch 303.10 | loss 4.71 | ppl 111.06 | epoch 76 | 600/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.69 | ppl 108.79 | epoch 76 | 800/ 1568 batches | lr 30.00 | ms/batch 303.22 | loss 4.70 | ppl 109.77 | epoch 76 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.49 | loss 4.73 | ppl 113.28 | epoch 76 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.70 | ppl 110.10 | epoch 76 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.75 | loss 4.70 | ppl 110.27 | epoch 76 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.05 | loss 4.67 | ppl 106.61 ----------------------------------------------------------------------------------------- | end of epoch 76 | time: 513.79s | valid loss 4.87 | valid ppl 130.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 77 | 200/ 1568 batches | lr 30.00 | ms/batch 299.45 | loss 4.72 | ppl 112.49 | epoch 77 | 400/ 1568 batches | lr 30.00 | ms/batch 299.53 | loss 4.70 | ppl 109.48 | epoch 77 | 600/ 1568 batches | lr 30.00 | ms/batch 303.52 | loss 4.68 | ppl 107.51 | epoch 77 | 800/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.68 | ppl 108.06 | epoch 77 | 1000/ 1568 batches | lr 30.00 | ms/batch 295.60 | loss 4.74 | ppl 114.16 | epoch 77 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.72 | loss 4.68 | ppl 107.80 | epoch 77 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.46 | loss 4.68 | ppl 107.84 | epoch 77 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.19 | loss 4.65 | ppl 104.99 ----------------------------------------------------------------------------------------- | end of epoch 77 | time: 513.88s | valid loss 4.87 | valid ppl 130.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 78 | 200/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.71 | ppl 111.46 | epoch 78 | 400/ 1568 batches | lr 30.00 | ms/batch 295.19 | loss 4.70 | ppl 110.04 | epoch 78 | 600/ 1568 batches | lr 30.00 | ms/batch 301.59 | loss 4.68 | ppl 107.40 | epoch 78 | 800/ 1568 batches | lr 30.00 | ms/batch 300.03 | loss 4.68 | ppl 108.24 | epoch 78 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.82 | loss 4.73 | ppl 112.88 | epoch 78 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.06 | loss 4.67 | ppl 106.69 | epoch 78 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.79 | loss 4.68 | ppl 107.52 | epoch 78 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.52 | loss 4.64 | ppl 103.89 ----------------------------------------------------------------------------------------- | end of epoch 78 | time: 513.67s | valid loss 4.87 | valid ppl 130.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 79 | 200/ 1568 batches | lr 30.00 | ms/batch 296.02 | loss 4.70 | ppl 110.14 | epoch 79 | 400/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 4.70 | ppl 110.30 | epoch 79 | 600/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.69 | ppl 108.33 | epoch 79 | 800/ 1568 batches | lr 30.00 | ms/batch 305.86 | loss 4.67 | ppl 106.49 | epoch 79 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.72 | loss 4.72 | ppl 112.18 | epoch 79 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.18 | loss 4.69 | ppl 108.34 | epoch 79 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.44 | loss 4.66 | ppl 105.58 | epoch 79 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.64 | ppl 103.90 ----------------------------------------------------------------------------------------- | end of epoch 79 | time: 513.99s | valid loss 4.87 | valid ppl 130.11 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 80 | 200/ 1568 batches | lr 30.00 | ms/batch 305.58 | loss 4.71 | ppl 110.82 | epoch 80 | 400/ 1568 batches | lr 30.00 | ms/batch 300.76 | loss 4.68 | ppl 107.53 | epoch 80 | 600/ 1568 batches | lr 30.00 | ms/batch 304.82 | loss 4.66 | ppl 105.25 | epoch 80 | 800/ 1568 batches | lr 30.00 | ms/batch 303.94 | loss 4.68 | ppl 107.38 | epoch 80 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.70 | ppl 110.17 | epoch 80 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.55 | loss 4.66 | ppl 105.44 | epoch 80 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.07 | loss 4.66 | ppl 105.78 | epoch 80 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.39 | loss 4.62 | ppl 101.54 ----------------------------------------------------------------------------------------- | end of epoch 80 | time: 513.59s | valid loss 4.87 | valid ppl 129.86 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 81 | 200/ 1568 batches | lr 30.00 | ms/batch 300.18 | loss 4.69 | ppl 108.86 | epoch 81 | 400/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.69 | ppl 108.86 | epoch 81 | 600/ 1568 batches | lr 30.00 | ms/batch 296.61 | loss 4.65 | ppl 104.88 | epoch 81 | 800/ 1568 batches | lr 30.00 | ms/batch 297.22 | loss 4.67 | ppl 106.47 | epoch 81 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.15 | loss 4.69 | ppl 109.02 | epoch 81 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.66 | ppl 105.30 | epoch 81 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.65 | ppl 104.28 | epoch 81 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.63 | ppl 102.76 ----------------------------------------------------------------------------------------- | end of epoch 81 | time: 514.26s | valid loss 4.86 | valid ppl 129.61 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 82 | 200/ 1568 batches | lr 30.00 | ms/batch 301.64 | loss 4.69 | ppl 108.64 | epoch 82 | 400/ 1568 batches | lr 30.00 | ms/batch 302.22 | loss 4.68 | ppl 107.48 | epoch 82 | 600/ 1568 batches | lr 30.00 | ms/batch 300.09 | loss 4.64 | ppl 103.35 | epoch 82 | 800/ 1568 batches | lr 30.00 | ms/batch 299.65 | loss 4.65 | ppl 104.57 | epoch 82 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.44 | loss 4.71 | ppl 110.83 | epoch 82 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.49 | loss 4.66 | ppl 106.01 | epoch 82 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.71 | loss 4.63 | ppl 102.66 | epoch 82 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.73 | loss 4.65 | ppl 104.16 ----------------------------------------------------------------------------------------- | end of epoch 82 | time: 514.08s | valid loss 4.86 | valid ppl 129.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 83 | 200/ 1568 batches | lr 30.00 | ms/batch 304.82 | loss 4.70 | ppl 109.77 | epoch 83 | 400/ 1568 batches | lr 30.00 | ms/batch 303.70 | loss 4.68 | ppl 108.17 | epoch 83 | 600/ 1568 batches | lr 30.00 | ms/batch 300.34 | loss 4.63 | ppl 102.67 | epoch 83 | 800/ 1568 batches | lr 30.00 | ms/batch 304.41 | loss 4.64 | ppl 103.99 | epoch 83 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.20 | loss 4.69 | ppl 108.47 | epoch 83 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.85 | loss 4.65 | ppl 104.19 | epoch 83 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.79 | loss 4.63 | ppl 102.24 | epoch 83 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.60 | ppl 99.55 ----------------------------------------------------------------------------------------- | end of epoch 83 | time: 515.33s | valid loss 4.86 | valid ppl 129.13 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 84 | 200/ 1568 batches | lr 30.00 | ms/batch 304.22 | loss 4.68 | ppl 107.75 | epoch 84 | 400/ 1568 batches | lr 30.00 | ms/batch 300.66 | loss 4.68 | ppl 107.38 | epoch 84 | 600/ 1568 batches | lr 30.00 | ms/batch 301.65 | loss 4.64 | ppl 103.43 | epoch 84 | 800/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 4.64 | ppl 103.37 | epoch 84 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.48 | loss 4.70 | ppl 109.45 | epoch 84 | 1200/ 1568 batches | lr 30.00 | ms/batch 307.64 | loss 4.64 | ppl 104.02 | epoch 84 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.29 | loss 4.63 | ppl 102.08 | epoch 84 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.04 | loss 4.61 | ppl 100.66 ----------------------------------------------------------------------------------------- | end of epoch 84 | time: 513.63s | valid loss 4.86 | valid ppl 128.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 85 | 200/ 1568 batches | lr 30.00 | ms/batch 301.68 | loss 4.69 | ppl 108.62 | epoch 85 | 400/ 1568 batches | lr 30.00 | ms/batch 302.63 | loss 4.65 | ppl 104.77 | epoch 85 | 600/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.63 | ppl 102.67 | epoch 85 | 800/ 1568 batches | lr 30.00 | ms/batch 301.05 | loss 4.65 | ppl 104.25 | epoch 85 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.47 | loss 4.68 | ppl 108.21 | epoch 85 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.59 | loss 4.65 | ppl 104.39 | epoch 85 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.27 | loss 4.61 | ppl 100.64 | epoch 85 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.69 | loss 4.60 | ppl 99.44 ----------------------------------------------------------------------------------------- | end of epoch 85 | time: 513.57s | valid loss 4.86 | valid ppl 128.67 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 86 | 200/ 1568 batches | lr 30.00 | ms/batch 303.36 | loss 4.67 | ppl 106.73 | epoch 86 | 400/ 1568 batches | lr 30.00 | ms/batch 301.80 | loss 4.65 | ppl 104.99 | epoch 86 | 600/ 1568 batches | lr 30.00 | ms/batch 299.50 | loss 4.63 | ppl 102.64 | epoch 86 | 800/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.63 | ppl 102.91 | epoch 86 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 4.68 | ppl 107.90 | epoch 86 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.81 | loss 4.61 | ppl 100.23 | epoch 86 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.74 | loss 4.62 | ppl 101.95 | epoch 86 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.18 | loss 4.59 | ppl 98.90 ----------------------------------------------------------------------------------------- | end of epoch 86 | time: 513.81s | valid loss 4.86 | valid ppl 128.46 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 87 | 200/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.67 | ppl 106.54 | epoch 87 | 400/ 1568 batches | lr 30.00 | ms/batch 300.95 | loss 4.65 | ppl 104.45 | epoch 87 | 600/ 1568 batches | lr 30.00 | ms/batch 302.23 | loss 4.64 | ppl 103.94 | epoch 87 | 800/ 1568 batches | lr 30.00 | ms/batch 298.47 | loss 4.63 | ppl 102.62 | epoch 87 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.17 | loss 4.67 | ppl 107.22 | epoch 87 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.83 | loss 4.62 | ppl 101.05 | epoch 87 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.89 | loss 4.61 | ppl 100.15 | epoch 87 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.60 | ppl 99.59 ----------------------------------------------------------------------------------------- | end of epoch 87 | time: 513.71s | valid loss 4.85 | valid ppl 128.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 88 | 200/ 1568 batches | lr 30.00 | ms/batch 302.14 | loss 4.66 | ppl 105.30 | epoch 88 | 400/ 1568 batches | lr 30.00 | ms/batch 302.00 | loss 4.64 | ppl 103.84 | epoch 88 | 600/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.62 | ppl 101.29 | epoch 88 | 800/ 1568 batches | lr 30.00 | ms/batch 301.94 | loss 4.61 | ppl 100.55 | epoch 88 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.89 | loss 4.65 | ppl 105.10 | epoch 88 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.14 | loss 4.61 | ppl 100.56 | epoch 88 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.35 | loss 4.61 | ppl 100.60 | epoch 88 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.47 | loss 4.60 | ppl 99.21 ----------------------------------------------------------------------------------------- | end of epoch 88 | time: 515.36s | valid loss 4.85 | valid ppl 128.06 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 89 | 200/ 1568 batches | lr 30.00 | ms/batch 305.14 | loss 4.64 | ppl 103.13 | epoch 89 | 400/ 1568 batches | lr 30.00 | ms/batch 298.05 | loss 4.65 | ppl 104.22 | epoch 89 | 600/ 1568 batches | lr 30.00 | ms/batch 299.87 | loss 4.61 | ppl 100.21 | epoch 89 | 800/ 1568 batches | lr 30.00 | ms/batch 299.69 | loss 4.60 | ppl 99.96 | epoch 89 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.71 | loss 4.67 | ppl 106.18 | epoch 89 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.75 | loss 4.63 | ppl 102.15 | epoch 89 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.45 | loss 4.60 | ppl 99.84 | epoch 89 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.40 | loss 4.57 | ppl 96.37 ----------------------------------------------------------------------------------------- | end of epoch 89 | time: 513.75s | valid loss 4.85 | valid ppl 127.87 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 90 | 200/ 1568 batches | lr 30.00 | ms/batch 305.69 | loss 4.63 | ppl 103.02 | epoch 90 | 400/ 1568 batches | lr 30.00 | ms/batch 293.48 | loss 4.63 | ppl 102.79 | epoch 90 | 600/ 1568 batches | lr 30.00 | ms/batch 300.62 | loss 4.60 | ppl 99.15 | epoch 90 | 800/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.61 | ppl 100.73 | epoch 90 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.27 | loss 4.66 | ppl 105.26 | epoch 90 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.20 | loss 4.61 | ppl 100.55 | epoch 90 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.59 | ppl 98.84 | epoch 90 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.30 | loss 4.57 | ppl 96.93 ----------------------------------------------------------------------------------------- | end of epoch 90 | time: 513.97s | valid loss 4.85 | valid ppl 127.68 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 91 | 200/ 1568 batches | lr 30.00 | ms/batch 301.24 | loss 4.63 | ppl 102.84 | epoch 91 | 400/ 1568 batches | lr 30.00 | ms/batch 300.69 | loss 4.61 | ppl 100.85 | epoch 91 | 600/ 1568 batches | lr 30.00 | ms/batch 305.22 | loss 4.59 | ppl 98.87 | epoch 91 | 800/ 1568 batches | lr 30.00 | ms/batch 294.57 | loss 4.61 | ppl 100.73 | epoch 91 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.98 | loss 4.66 | ppl 105.70 | epoch 91 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.88 | loss 4.59 | ppl 98.57 | epoch 91 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.36 | loss 4.57 | ppl 96.91 | epoch 91 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.97 | loss 4.60 | ppl 99.05 ----------------------------------------------------------------------------------------- | end of epoch 91 | time: 514.18s | valid loss 4.85 | valid ppl 127.49 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 92 | 200/ 1568 batches | lr 30.00 | ms/batch 301.29 | loss 4.65 | ppl 104.58 | epoch 92 | 400/ 1568 batches | lr 30.00 | ms/batch 293.98 | loss 4.62 | ppl 101.41 | epoch 92 | 600/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.59 | ppl 98.39 | epoch 92 | 800/ 1568 batches | lr 30.00 | ms/batch 306.21 | loss 4.61 | ppl 100.09 | epoch 92 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.21 | loss 4.65 | ppl 104.17 | epoch 92 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.07 | loss 4.59 | ppl 98.10 | epoch 92 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.88 | loss 4.59 | ppl 98.52 | epoch 92 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.60 | loss 4.55 | ppl 94.90 ----------------------------------------------------------------------------------------- | end of epoch 92 | time: 514.18s | valid loss 4.85 | valid ppl 127.31 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 93 | 200/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.63 | ppl 102.74 | epoch 93 | 400/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 4.62 | ppl 101.39 | epoch 93 | 600/ 1568 batches | lr 30.00 | ms/batch 303.97 | loss 4.59 | ppl 98.38 | epoch 93 | 800/ 1568 batches | lr 30.00 | ms/batch 299.31 | loss 4.59 | ppl 98.44 | epoch 93 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.28 | loss 4.64 | ppl 103.61 | epoch 93 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.30 | loss 4.60 | ppl 99.85 | epoch 93 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.46 | loss 4.57 | ppl 96.54 | epoch 93 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 4.57 | ppl 96.23 ----------------------------------------------------------------------------------------- | end of epoch 93 | time: 514.01s | valid loss 4.85 | valid ppl 127.13 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 94 | 200/ 1568 batches | lr 30.00 | ms/batch 300.99 | loss 4.62 | ppl 101.64 | epoch 94 | 400/ 1568 batches | lr 30.00 | ms/batch 304.00 | loss 4.61 | ppl 100.82 | epoch 94 | 600/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.58 | ppl 97.77 | epoch 94 | 800/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.60 | ppl 99.33 | epoch 94 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.35 | loss 4.63 | ppl 102.33 | epoch 94 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.21 | loss 4.59 | ppl 98.11 | epoch 94 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.58 | loss 4.57 | ppl 96.82 | epoch 94 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.57 | ppl 96.69 ----------------------------------------------------------------------------------------- | end of epoch 94 | time: 514.02s | valid loss 4.84 | valid ppl 126.96 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 95 | 200/ 1568 batches | lr 30.00 | ms/batch 303.56 | loss 4.61 | ppl 100.46 | epoch 95 | 400/ 1568 batches | lr 30.00 | ms/batch 300.09 | loss 4.62 | ppl 101.65 | epoch 95 | 600/ 1568 batches | lr 30.00 | ms/batch 298.61 | loss 4.57 | ppl 96.08 | epoch 95 | 800/ 1568 batches | lr 30.00 | ms/batch 305.86 | loss 4.59 | ppl 98.19 | epoch 95 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.98 | loss 4.63 | ppl 102.81 | epoch 95 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.83 | loss 4.59 | ppl 98.10 | epoch 95 | 1400/ 1568 batches | lr 30.00 | ms/batch 294.97 | loss 4.56 | ppl 95.66 | epoch 95 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.80 | loss 4.57 | ppl 96.09 ----------------------------------------------------------------------------------------- | end of epoch 95 | time: 514.07s | valid loss 4.84 | valid ppl 126.80 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 96 | 200/ 1568 batches | lr 30.00 | ms/batch 304.66 | loss 4.62 | ppl 101.12 | epoch 96 | 400/ 1568 batches | lr 30.00 | ms/batch 296.73 | loss 4.61 | ppl 100.29 | epoch 96 | 600/ 1568 batches | lr 30.00 | ms/batch 303.99 | loss 4.58 | ppl 97.62 | epoch 96 | 800/ 1568 batches | lr 30.00 | ms/batch 301.37 | loss 4.57 | ppl 96.75 | epoch 96 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.38 | loss 4.63 | ppl 102.67 | epoch 96 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.99 | loss 4.59 | ppl 98.68 | epoch 96 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.65 | loss 4.56 | ppl 95.58 | epoch 96 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.16 | loss 4.56 | ppl 95.84 ----------------------------------------------------------------------------------------- | end of epoch 96 | time: 513.93s | valid loss 4.84 | valid ppl 126.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 97 | 200/ 1568 batches | lr 30.00 | ms/batch 304.47 | loss 4.60 | ppl 99.90 | epoch 97 | 400/ 1568 batches | lr 30.00 | ms/batch 298.69 | loss 4.60 | ppl 99.48 | epoch 97 | 600/ 1568 batches | lr 30.00 | ms/batch 300.24 | loss 4.57 | ppl 96.49 | epoch 97 | 800/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.58 | ppl 97.07 | epoch 97 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.63 | loss 4.61 | ppl 100.83 | epoch 97 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.58 | ppl 97.07 | epoch 97 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.73 | loss 4.56 | ppl 96.03 | epoch 97 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.68 | loss 4.54 | ppl 93.29 ----------------------------------------------------------------------------------------- | end of epoch 97 | time: 513.78s | valid loss 4.84 | valid ppl 126.48 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 98 | 200/ 1568 batches | lr 30.00 | ms/batch 296.17 | loss 4.62 | ppl 101.12 | epoch 98 | 400/ 1568 batches | lr 30.00 | ms/batch 304.73 | loss 4.61 | ppl 100.12 | epoch 98 | 600/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.56 | ppl 95.95 | epoch 98 | 800/ 1568 batches | lr 30.00 | ms/batch 301.30 | loss 4.56 | ppl 95.99 | epoch 98 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.62 | ppl 101.53 | epoch 98 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.92 | loss 4.57 | ppl 96.56 | epoch 98 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.33 | loss 4.56 | ppl 95.25 | epoch 98 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.53 | ppl 92.71 ----------------------------------------------------------------------------------------- | end of epoch 98 | time: 514.04s | valid loss 4.84 | valid ppl 126.33 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 99 | 200/ 1568 batches | lr 30.00 | ms/batch 299.41 | loss 4.61 | ppl 100.91 | epoch 99 | 400/ 1568 batches | lr 30.00 | ms/batch 297.92 | loss 4.58 | ppl 97.24 | epoch 99 | 600/ 1568 batches | lr 30.00 | ms/batch 305.17 | loss 4.56 | ppl 95.46 | epoch 99 | 800/ 1568 batches | lr 30.00 | ms/batch 303.33 | loss 4.57 | ppl 96.90 | epoch 99 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.61 | ppl 100.48 | epoch 99 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.88 | loss 4.56 | ppl 95.34 | epoch 99 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.73 | loss 4.56 | ppl 95.40 | epoch 99 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.07 | loss 4.53 | ppl 92.84 ----------------------------------------------------------------------------------------- | end of epoch 99 | time: 514.00s | valid loss 4.84 | valid ppl 126.18 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 100 | 200/ 1568 batches | lr 30.00 | ms/batch 302.41 | loss 4.59 | ppl 98.84 | epoch 100 | 400/ 1568 batches | lr 30.00 | ms/batch 298.67 | loss 4.56 | ppl 95.92 | epoch 100 | 600/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.57 | ppl 96.48 | epoch 100 | 800/ 1568 batches | lr 30.00 | ms/batch 299.87 | loss 4.56 | ppl 95.23 | epoch 100 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.63 | loss 4.61 | ppl 100.24 | epoch 100 | 1200/ 1568 batches | lr 30.00 | ms/batch 307.50 | loss 4.57 | ppl 96.12 | epoch 100 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.78 | loss 4.55 | ppl 94.63 | epoch 100 | 1600/ 1568 batches | lr 30.00 | ms/batch 294.57 | loss 4.52 | ppl 92.06 ----------------------------------------------------------------------------------------- | end of epoch 100 | time: 514.30s | valid loss 4.84 | valid ppl 126.03 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 101 | 200/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.60 | ppl 99.41 | epoch 101 | 400/ 1568 batches | lr 30.00 | ms/batch 299.07 | loss 4.59 | ppl 98.14 | epoch 101 | 600/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.56 | ppl 95.56 | epoch 101 | 800/ 1568 batches | lr 30.00 | ms/batch 303.96 | loss 4.56 | ppl 96.04 | epoch 101 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.56 | loss 4.59 | ppl 98.83 | epoch 101 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.88 | loss 4.56 | ppl 95.93 | epoch 101 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.93 | loss 4.55 | ppl 94.25 | epoch 101 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.60 | loss 4.52 | ppl 92.27 ----------------------------------------------------------------------------------------- | end of epoch 101 | time: 513.46s | valid loss 4.84 | valid ppl 125.89 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 102 | 200/ 1568 batches | lr 30.00 | ms/batch 303.34 | loss 4.59 | ppl 98.10 | epoch 102 | 400/ 1568 batches | lr 30.00 | ms/batch 301.39 | loss 4.58 | ppl 97.28 | epoch 102 | 600/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.55 | ppl 94.94 | epoch 102 | 800/ 1568 batches | lr 30.00 | ms/batch 300.33 | loss 4.56 | ppl 95.23 | epoch 102 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.60 | ppl 99.28 | epoch 102 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.68 | loss 4.55 | ppl 94.24 | epoch 102 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.45 | loss 4.54 | ppl 93.81 | epoch 102 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.48 | loss 4.52 | ppl 92.25 ----------------------------------------------------------------------------------------- | end of epoch 102 | time: 513.68s | valid loss 4.83 | valid ppl 125.76 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 103 | 200/ 1568 batches | lr 30.00 | ms/batch 309.96 | loss 4.59 | ppl 98.29 | epoch 103 | 400/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.57 | ppl 96.28 | epoch 103 | 600/ 1568 batches | lr 30.00 | ms/batch 298.54 | loss 4.53 | ppl 92.62 | epoch 103 | 800/ 1568 batches | lr 30.00 | ms/batch 303.97 | loss 4.57 | ppl 96.56 | epoch 103 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.56 | loss 4.59 | ppl 98.07 | epoch 103 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.09 | loss 4.55 | ppl 94.96 | epoch 103 | 1400/ 1568 batches | lr 30.00 | ms/batch 296.92 | loss 4.52 | ppl 91.80 | epoch 103 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.53 | ppl 92.36 ----------------------------------------------------------------------------------------- | end of epoch 103 | time: 513.65s | valid loss 4.83 | valid ppl 125.63 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 104 | 200/ 1568 batches | lr 30.00 | ms/batch 297.75 | loss 4.58 | ppl 97.69 | epoch 104 | 400/ 1568 batches | lr 30.00 | ms/batch 300.46 | loss 4.57 | ppl 96.26 | epoch 104 | 600/ 1568 batches | lr 30.00 | ms/batch 304.08 | loss 4.54 | ppl 93.35 | epoch 104 | 800/ 1568 batches | lr 30.00 | ms/batch 302.67 | loss 4.53 | ppl 92.82 | epoch 104 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.59 | ppl 98.37 | epoch 104 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.51 | loss 4.54 | ppl 93.66 | epoch 104 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.63 | loss 4.54 | ppl 93.61 | epoch 104 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.85 | loss 4.52 | ppl 91.53 ----------------------------------------------------------------------------------------- | end of epoch 104 | time: 514.36s | valid loss 4.83 | valid ppl 125.50 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 105 | 200/ 1568 batches | lr 30.00 | ms/batch 306.64 | loss 4.58 | ppl 97.16 | epoch 105 | 400/ 1568 batches | lr 30.00 | ms/batch 298.86 | loss 4.55 | ppl 94.45 | epoch 105 | 600/ 1568 batches | lr 30.00 | ms/batch 299.93 | loss 4.53 | ppl 92.99 | epoch 105 | 800/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.55 | ppl 94.33 | epoch 105 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.59 | ppl 98.28 | epoch 105 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.35 | loss 4.54 | ppl 93.61 | epoch 105 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.54 | ppl 93.23 | epoch 105 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.72 | loss 4.51 | ppl 91.36 ----------------------------------------------------------------------------------------- | end of epoch 105 | time: 514.03s | valid loss 4.83 | valid ppl 125.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 106 | 200/ 1568 batches | lr 30.00 | ms/batch 298.20 | loss 4.57 | ppl 96.32 | epoch 106 | 400/ 1568 batches | lr 30.00 | ms/batch 297.05 | loss 4.55 | ppl 94.78 | epoch 106 | 600/ 1568 batches | lr 30.00 | ms/batch 301.66 | loss 4.52 | ppl 92.16 | epoch 106 | 800/ 1568 batches | lr 30.00 | ms/batch 298.80 | loss 4.52 | ppl 91.89 | epoch 106 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.60 | ppl 99.66 | epoch 106 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.54 | ppl 93.62 | epoch 106 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.14 | loss 4.52 | ppl 91.70 | epoch 106 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.04 | loss 4.50 | ppl 89.76 ----------------------------------------------------------------------------------------- | end of epoch 106 | time: 514.19s | valid loss 4.83 | valid ppl 125.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 107 | 200/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.57 | ppl 96.27 | epoch 107 | 400/ 1568 batches | lr 30.00 | ms/batch 300.85 | loss 4.56 | ppl 95.23 | epoch 107 | 600/ 1568 batches | lr 30.00 | ms/batch 299.21 | loss 4.52 | ppl 91.92 | epoch 107 | 800/ 1568 batches | lr 30.00 | ms/batch 303.07 | loss 4.52 | ppl 91.86 | epoch 107 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.04 | loss 4.59 | ppl 98.72 | epoch 107 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.30 | loss 4.52 | ppl 91.78 | epoch 107 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.52 | ppl 91.59 | epoch 107 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.51 | loss 4.51 | ppl 90.98 ----------------------------------------------------------------------------------------- | end of epoch 107 | time: 514.03s | valid loss 4.83 | valid ppl 125.12 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 108 | 200/ 1568 batches | lr 30.00 | ms/batch 304.56 | loss 4.56 | ppl 95.93 | epoch 108 | 400/ 1568 batches | lr 30.00 | ms/batch 300.42 | loss 4.54 | ppl 94.14 | epoch 108 | 600/ 1568 batches | lr 30.00 | ms/batch 300.72 | loss 4.51 | ppl 90.61 | epoch 108 | 800/ 1568 batches | lr 30.00 | ms/batch 301.12 | loss 4.52 | ppl 91.74 | epoch 108 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.77 | loss 4.58 | ppl 97.96 | epoch 108 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.61 | loss 4.53 | ppl 92.96 | epoch 108 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.50 | ppl 89.95 | epoch 108 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.90 | loss 4.49 | ppl 89.46 ----------------------------------------------------------------------------------------- | end of epoch 108 | time: 514.06s | valid loss 4.83 | valid ppl 125.01 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 109 | 200/ 1568 batches | lr 30.00 | ms/batch 306.50 | loss 4.57 | ppl 96.99 | epoch 109 | 400/ 1568 batches | lr 30.00 | ms/batch 295.97 | loss 4.55 | ppl 95.01 | epoch 109 | 600/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.52 | ppl 91.38 | epoch 109 | 800/ 1568 batches | lr 30.00 | ms/batch 301.66 | loss 4.53 | ppl 92.63 | epoch 109 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.67 | loss 4.58 | ppl 97.42 | epoch 109 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.52 | ppl 91.50 | epoch 109 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.37 | loss 4.51 | ppl 91.21 | epoch 109 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.27 | loss 4.49 | ppl 88.94 ----------------------------------------------------------------------------------------- | end of epoch 109 | time: 513.99s | valid loss 4.83 | valid ppl 124.89 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 110 | 200/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.55 | ppl 94.31 | epoch 110 | 400/ 1568 batches | lr 30.00 | ms/batch 299.54 | loss 4.54 | ppl 93.68 | epoch 110 | 600/ 1568 batches | lr 30.00 | ms/batch 302.93 | loss 4.50 | ppl 90.36 | epoch 110 | 800/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.52 | ppl 91.96 | epoch 110 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.92 | loss 4.56 | ppl 95.98 | epoch 110 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.66 | loss 4.52 | ppl 92.22 | epoch 110 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.66 | loss 4.51 | ppl 91.27 | epoch 110 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.42 | loss 4.50 | ppl 89.67 ----------------------------------------------------------------------------------------- | end of epoch 110 | time: 513.93s | valid loss 4.83 | valid ppl 124.79 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 111 | 200/ 1568 batches | lr 30.00 | ms/batch 304.14 | loss 4.55 | ppl 94.77 | epoch 111 | 400/ 1568 batches | lr 30.00 | ms/batch 304.08 | loss 4.54 | ppl 93.65 | epoch 111 | 600/ 1568 batches | lr 30.00 | ms/batch 297.66 | loss 4.52 | ppl 92.11 | epoch 111 | 800/ 1568 batches | lr 30.00 | ms/batch 302.12 | loss 4.52 | ppl 91.60 | epoch 111 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.78 | loss 4.56 | ppl 95.80 | epoch 111 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.52 | ppl 91.88 | epoch 111 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.21 | loss 4.52 | ppl 91.79 | epoch 111 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.76 | loss 4.49 | ppl 89.36 ----------------------------------------------------------------------------------------- | end of epoch 111 | time: 513.62s | valid loss 4.83 | valid ppl 124.68 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 112 | 200/ 1568 batches | lr 30.00 | ms/batch 298.80 | loss 4.54 | ppl 93.49 | epoch 112 | 400/ 1568 batches | lr 30.00 | ms/batch 299.56 | loss 4.52 | ppl 91.44 | epoch 112 | 600/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.51 | ppl 91.27 | epoch 112 | 800/ 1568 batches | lr 30.00 | ms/batch 301.02 | loss 4.50 | ppl 90.21 | epoch 112 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.89 | loss 4.57 | ppl 96.57 | epoch 112 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.51 | ppl 91.04 | epoch 112 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.47 | loss 4.51 | ppl 90.53 | epoch 112 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.47 | ppl 87.73 ----------------------------------------------------------------------------------------- | end of epoch 112 | time: 513.74s | valid loss 4.82 | valid ppl 124.58 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 113 | 200/ 1568 batches | lr 30.00 | ms/batch 303.61 | loss 4.53 | ppl 93.21 | epoch 113 | 400/ 1568 batches | lr 30.00 | ms/batch 299.52 | loss 4.52 | ppl 91.97 | epoch 113 | 600/ 1568 batches | lr 30.00 | ms/batch 300.96 | loss 4.50 | ppl 90.28 | epoch 113 | 800/ 1568 batches | lr 30.00 | ms/batch 302.42 | loss 4.50 | ppl 90.38 | epoch 113 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.53 | loss 4.56 | ppl 95.65 | epoch 113 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.52 | ppl 91.53 | epoch 113 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.10 | loss 4.53 | ppl 92.32 | epoch 113 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.77 | loss 4.48 | ppl 88.41 ----------------------------------------------------------------------------------------- | end of epoch 113 | time: 513.43s | valid loss 4.82 | valid ppl 124.48 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 114 | 200/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.54 | ppl 93.77 | epoch 114 | 400/ 1568 batches | lr 30.00 | ms/batch 297.57 | loss 4.52 | ppl 92.05 | epoch 114 | 600/ 1568 batches | lr 30.00 | ms/batch 298.95 | loss 4.52 | ppl 92.01 | epoch 114 | 800/ 1568 batches | lr 30.00 | ms/batch 304.05 | loss 4.50 | ppl 89.81 | epoch 114 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.56 | ppl 95.36 | epoch 114 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.51 | ppl 91.06 | epoch 114 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.49 | ppl 88.79 | epoch 114 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.02 | loss 4.45 | ppl 85.65 ----------------------------------------------------------------------------------------- | end of epoch 114 | time: 514.03s | valid loss 4.82 | valid ppl 124.38 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 115 | 200/ 1568 batches | lr 30.00 | ms/batch 305.06 | loss 4.53 | ppl 92.98 | epoch 115 | 400/ 1568 batches | lr 30.00 | ms/batch 299.23 | loss 4.52 | ppl 91.68 | epoch 115 | 600/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.50 | ppl 90.14 | epoch 115 | 800/ 1568 batches | lr 30.00 | ms/batch 303.51 | loss 4.51 | ppl 90.77 | epoch 115 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.53 | loss 4.53 | ppl 93.21 | epoch 115 | 1200/ 1568 batches | lr 30.00 | ms/batch 307.10 | loss 4.51 | ppl 90.49 | epoch 115 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.72 | loss 4.51 | ppl 90.57 | epoch 115 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.87 | loss 4.46 | ppl 86.52 ----------------------------------------------------------------------------------------- | end of epoch 115 | time: 514.04s | valid loss 4.82 | valid ppl 124.29 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 116 | 200/ 1568 batches | lr 30.00 | ms/batch 311.18 | loss 4.52 | ppl 92.21 | epoch 116 | 400/ 1568 batches | lr 30.00 | ms/batch 299.64 | loss 4.52 | ppl 91.38 | epoch 116 | 600/ 1568 batches | lr 30.00 | ms/batch 297.30 | loss 4.48 | ppl 88.12 | epoch 116 | 800/ 1568 batches | lr 30.00 | ms/batch 303.84 | loss 4.51 | ppl 91.22 | epoch 116 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.01 | loss 4.56 | ppl 95.31 | epoch 116 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.50 | ppl 89.68 | epoch 116 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.50 | ppl 89.80 | epoch 116 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.37 | loss 4.46 | ppl 86.70 ----------------------------------------------------------------------------------------- | end of epoch 116 | time: 513.78s | valid loss 4.82 | valid ppl 124.18 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 117 | 200/ 1568 batches | lr 30.00 | ms/batch 303.03 | loss 4.53 | ppl 93.04 | epoch 117 | 400/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.51 | ppl 91.37 | epoch 117 | 600/ 1568 batches | lr 30.00 | ms/batch 299.80 | loss 4.48 | ppl 87.87 | epoch 117 | 800/ 1568 batches | lr 30.00 | ms/batch 299.83 | loss 4.50 | ppl 89.60 | epoch 117 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.55 | ppl 94.19 | epoch 117 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.34 | loss 4.49 | ppl 88.89 | epoch 117 | 1400/ 1568 batches | lr 30.00 | ms/batch 307.14 | loss 4.50 | ppl 90.23 | epoch 117 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.17 | loss 4.45 | ppl 85.86 ----------------------------------------------------------------------------------------- | end of epoch 117 | time: 513.99s | valid loss 4.82 | valid ppl 124.09 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 118 | 200/ 1568 batches | lr 30.00 | ms/batch 305.68 | loss 4.54 | ppl 93.38 | epoch 118 | 400/ 1568 batches | lr 30.00 | ms/batch 303.69 | loss 4.50 | ppl 90.41 | epoch 118 | 600/ 1568 batches | lr 30.00 | ms/batch 301.10 | loss 4.49 | ppl 89.39 | epoch 118 | 800/ 1568 batches | lr 30.00 | ms/batch 302.26 | loss 4.50 | ppl 90.36 | epoch 118 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.35 | loss 4.54 | ppl 93.95 | epoch 118 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.81 | loss 4.48 | ppl 87.95 | epoch 118 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.70 | loss 4.49 | ppl 89.50 | epoch 118 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.45 | ppl 85.29 ----------------------------------------------------------------------------------------- | end of epoch 118 | time: 513.79s | valid loss 4.82 | valid ppl 123.99 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 119 | 200/ 1568 batches | lr 30.00 | ms/batch 303.37 | loss 4.52 | ppl 91.59 | epoch 119 | 400/ 1568 batches | lr 30.00 | ms/batch 302.83 | loss 4.50 | ppl 90.35 | epoch 119 | 600/ 1568 batches | lr 30.00 | ms/batch 303.53 | loss 4.49 | ppl 89.38 | epoch 119 | 800/ 1568 batches | lr 30.00 | ms/batch 306.41 | loss 4.49 | ppl 89.23 | epoch 119 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.05 | loss 4.54 | ppl 93.56 | epoch 119 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.47 | loss 4.47 | ppl 87.20 | epoch 119 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.34 | loss 4.49 | ppl 88.90 | epoch 119 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.16 | loss 4.44 | ppl 84.41 ----------------------------------------------------------------------------------------- | end of epoch 119 | time: 513.72s | valid loss 4.82 | valid ppl 123.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 120 | 200/ 1568 batches | lr 30.00 | ms/batch 303.79 | loss 4.53 | ppl 92.43 | epoch 120 | 400/ 1568 batches | lr 30.00 | ms/batch 301.13 | loss 4.50 | ppl 89.71 | epoch 120 | 600/ 1568 batches | lr 30.00 | ms/batch 301.64 | loss 4.50 | ppl 90.06 | epoch 120 | 800/ 1568 batches | lr 30.00 | ms/batch 296.75 | loss 4.47 | ppl 87.32 | epoch 120 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.55 | ppl 94.84 | epoch 120 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.18 | loss 4.47 | ppl 87.64 | epoch 120 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.97 | loss 4.47 | ppl 87.38 | epoch 120 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.03 | loss 4.47 | ppl 87.12 ----------------------------------------------------------------------------------------- | end of epoch 120 | time: 514.64s | valid loss 4.82 | valid ppl 123.80 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 121 | 200/ 1568 batches | lr 30.00 | ms/batch 299.87 | loss 4.51 | ppl 91.23 | epoch 121 | 400/ 1568 batches | lr 30.00 | ms/batch 297.95 | loss 4.49 | ppl 89.41 | epoch 121 | 600/ 1568 batches | lr 30.00 | ms/batch 299.50 | loss 4.48 | ppl 88.26 | epoch 121 | 800/ 1568 batches | lr 30.00 | ms/batch 299.60 | loss 4.48 | ppl 88.26 | epoch 121 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.80 | loss 4.54 | ppl 93.23 | epoch 121 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.01 | loss 4.48 | ppl 87.97 | epoch 121 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.71 | loss 4.47 | ppl 86.93 | epoch 121 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.44 | ppl 84.53 ----------------------------------------------------------------------------------------- | end of epoch 121 | time: 514.30s | valid loss 4.82 | valid ppl 123.71 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 122 | 200/ 1568 batches | lr 30.00 | ms/batch 301.33 | loss 4.51 | ppl 91.28 | epoch 122 | 400/ 1568 batches | lr 30.00 | ms/batch 303.77 | loss 4.49 | ppl 88.70 | epoch 122 | 600/ 1568 batches | lr 30.00 | ms/batch 303.35 | loss 4.48 | ppl 87.95 | epoch 122 | 800/ 1568 batches | lr 30.00 | ms/batch 298.09 | loss 4.48 | ppl 87.88 | epoch 122 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.60 | loss 4.52 | ppl 91.80 | epoch 122 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.50 | loss 4.47 | ppl 87.43 | epoch 122 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.95 | loss 4.46 | ppl 86.81 | epoch 122 | 1600/ 1568 batches | lr 30.00 | ms/batch 307.09 | loss 4.44 | ppl 84.77 ----------------------------------------------------------------------------------------- | end of epoch 122 | time: 513.69s | valid loss 4.82 | valid ppl 123.62 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 123 | 200/ 1568 batches | lr 30.00 | ms/batch 306.87 | loss 4.51 | ppl 91.24 | epoch 123 | 400/ 1568 batches | lr 30.00 | ms/batch 296.87 | loss 4.50 | ppl 89.72 | epoch 123 | 600/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.49 | ppl 88.70 | epoch 123 | 800/ 1568 batches | lr 30.00 | ms/batch 298.69 | loss 4.47 | ppl 87.66 | epoch 123 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.53 | ppl 92.65 | epoch 123 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.48 | ppl 87.92 | epoch 123 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.47 | ppl 87.45 | epoch 123 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.44 | ppl 85.08 ----------------------------------------------------------------------------------------- | end of epoch 123 | time: 514.21s | valid loss 4.82 | valid ppl 123.53 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 124 | 200/ 1568 batches | lr 30.00 | ms/batch 302.97 | loss 4.50 | ppl 89.82 | epoch 124 | 400/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.49 | ppl 89.55 | epoch 124 | 600/ 1568 batches | lr 30.00 | ms/batch 304.68 | loss 4.47 | ppl 87.02 | epoch 124 | 800/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.49 | ppl 88.85 | epoch 124 | 1000/ 1568 batches | lr 30.00 | ms/batch 293.57 | loss 4.52 | ppl 91.66 | epoch 124 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.47 | ppl 87.04 | epoch 124 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.05 | loss 4.47 | ppl 87.04 | epoch 124 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.92 | loss 4.45 | ppl 85.57 ----------------------------------------------------------------------------------------- | end of epoch 124 | time: 513.85s | valid loss 4.82 | valid ppl 123.45 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 125 | 200/ 1568 batches | lr 30.00 | ms/batch 305.24 | loss 4.51 | ppl 90.82 | epoch 125 | 400/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.48 | ppl 88.00 | epoch 125 | 600/ 1568 batches | lr 30.00 | ms/batch 301.25 | loss 4.47 | ppl 87.06 | epoch 125 | 800/ 1568 batches | lr 30.00 | ms/batch 303.57 | loss 4.46 | ppl 86.34 | epoch 125 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.33 | loss 4.53 | ppl 92.68 | epoch 125 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.20 | loss 4.45 | ppl 85.84 | epoch 125 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.63 | loss 4.47 | ppl 87.19 | epoch 125 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.72 | loss 4.43 | ppl 84.28 ----------------------------------------------------------------------------------------- | end of epoch 125 | time: 513.92s | valid loss 4.82 | valid ppl 123.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 126 | 200/ 1568 batches | lr 30.00 | ms/batch 305.87 | loss 4.51 | ppl 90.65 | epoch 126 | 400/ 1568 batches | lr 30.00 | ms/batch 297.86 | loss 4.47 | ppl 87.73 | epoch 126 | 600/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.45 | ppl 85.46 | epoch 126 | 800/ 1568 batches | lr 30.00 | ms/batch 297.90 | loss 4.46 | ppl 86.25 | epoch 126 | 1000/ 1568 batches | lr 30.00 | ms/batch 294.69 | loss 4.53 | ppl 92.31 | epoch 126 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.05 | loss 4.49 | ppl 89.19 | epoch 126 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.06 | loss 4.45 | ppl 85.21 | epoch 126 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.56 | loss 4.44 | ppl 84.66 ----------------------------------------------------------------------------------------- | end of epoch 126 | time: 513.81s | valid loss 4.81 | valid ppl 123.29 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 127 | 200/ 1568 batches | lr 30.00 | ms/batch 307.32 | loss 4.49 | ppl 89.00 | epoch 127 | 400/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.49 | ppl 88.79 | epoch 127 | 600/ 1568 batches | lr 30.00 | ms/batch 300.84 | loss 4.46 | ppl 86.57 | epoch 127 | 800/ 1568 batches | lr 30.00 | ms/batch 303.91 | loss 4.48 | ppl 88.22 | epoch 127 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.51 | ppl 91.03 | epoch 127 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.42 | loss 4.45 | ppl 85.51 | epoch 127 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.18 | loss 4.47 | ppl 87.56 | epoch 127 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.37 | loss 4.42 | ppl 82.93 ----------------------------------------------------------------------------------------- | end of epoch 127 | time: 513.58s | valid loss 4.81 | valid ppl 123.21 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 128 | 200/ 1568 batches | lr 30.00 | ms/batch 304.15 | loss 4.49 | ppl 89.28 | epoch 128 | 400/ 1568 batches | lr 30.00 | ms/batch 303.88 | loss 4.46 | ppl 86.06 | epoch 128 | 600/ 1568 batches | lr 30.00 | ms/batch 298.96 | loss 4.45 | ppl 85.48 | epoch 128 | 800/ 1568 batches | lr 30.00 | ms/batch 301.45 | loss 4.48 | ppl 88.17 | epoch 128 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.51 | ppl 91.33 | epoch 128 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.76 | loss 4.45 | ppl 86.02 | epoch 128 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.91 | loss 4.46 | ppl 86.25 | epoch 128 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.97 | loss 4.43 | ppl 83.94 ----------------------------------------------------------------------------------------- | end of epoch 128 | time: 513.55s | valid loss 4.81 | valid ppl 123.13 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 129 | 200/ 1568 batches | lr 30.00 | ms/batch 302.71 | loss 4.48 | ppl 88.35 | epoch 129 | 400/ 1568 batches | lr 30.00 | ms/batch 300.14 | loss 4.47 | ppl 87.08 | epoch 129 | 600/ 1568 batches | lr 30.00 | ms/batch 304.26 | loss 4.45 | ppl 85.23 | epoch 129 | 800/ 1568 batches | lr 30.00 | ms/batch 297.21 | loss 4.46 | ppl 86.40 | epoch 129 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.64 | loss 4.50 | ppl 89.72 | epoch 129 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.45 | ppl 85.37 | epoch 129 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.13 | loss 4.46 | ppl 86.92 | epoch 129 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.55 | loss 4.42 | ppl 83.34 ----------------------------------------------------------------------------------------- | end of epoch 129 | time: 513.73s | valid loss 4.81 | valid ppl 123.06 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 130 | 200/ 1568 batches | lr 30.00 | ms/batch 306.17 | loss 4.48 | ppl 88.12 | epoch 130 | 400/ 1568 batches | lr 30.00 | ms/batch 300.90 | loss 4.47 | ppl 87.53 | epoch 130 | 600/ 1568 batches | lr 30.00 | ms/batch 299.39 | loss 4.45 | ppl 85.59 | epoch 130 | 800/ 1568 batches | lr 30.00 | ms/batch 298.03 | loss 4.46 | ppl 86.56 | epoch 130 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.96 | loss 4.50 | ppl 90.36 | epoch 130 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.72 | loss 4.46 | ppl 86.50 | epoch 130 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.44 | ppl 84.96 | epoch 130 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 4.40 | ppl 81.65 ----------------------------------------------------------------------------------------- | end of epoch 130 | time: 513.91s | valid loss 4.81 | valid ppl 122.98 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 131 | 200/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.48 | ppl 88.34 | epoch 131 | 400/ 1568 batches | lr 30.00 | ms/batch 301.15 | loss 4.46 | ppl 86.88 | epoch 131 | 600/ 1568 batches | lr 30.00 | ms/batch 301.35 | loss 4.43 | ppl 83.64 | epoch 131 | 800/ 1568 batches | lr 30.00 | ms/batch 302.49 | loss 4.46 | ppl 86.73 | epoch 131 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.54 | loss 4.50 | ppl 90.08 | epoch 131 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.29 | loss 4.44 | ppl 84.84 | epoch 131 | 1400/ 1568 batches | lr 30.00 | ms/batch 308.10 | loss 4.45 | ppl 85.45 | epoch 131 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.10 | loss 4.40 | ppl 81.19 ----------------------------------------------------------------------------------------- | end of epoch 131 | time: 513.51s | valid loss 4.81 | valid ppl 122.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 132 | 200/ 1568 batches | lr 30.00 | ms/batch 305.74 | loss 4.48 | ppl 88.11 | epoch 132 | 400/ 1568 batches | lr 30.00 | ms/batch 301.14 | loss 4.46 | ppl 86.83 | epoch 132 | 600/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.46 | ppl 86.06 | epoch 132 | 800/ 1568 batches | lr 30.00 | ms/batch 298.90 | loss 4.47 | ppl 86.97 | epoch 132 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.90 | loss 4.51 | ppl 90.89 | epoch 132 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.42 | ppl 83.15 | epoch 132 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.96 | loss 4.42 | ppl 83.38 | epoch 132 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.36 | loss 4.41 | ppl 82.61 ----------------------------------------------------------------------------------------- | end of epoch 132 | time: 513.67s | valid loss 4.81 | valid ppl 122.84 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 133 | 200/ 1568 batches | lr 30.00 | ms/batch 306.27 | loss 4.47 | ppl 87.47 | epoch 133 | 400/ 1568 batches | lr 30.00 | ms/batch 299.92 | loss 4.46 | ppl 86.32 | epoch 133 | 600/ 1568 batches | lr 30.00 | ms/batch 302.47 | loss 4.44 | ppl 85.13 | epoch 133 | 800/ 1568 batches | lr 30.00 | ms/batch 302.94 | loss 4.45 | ppl 85.95 | epoch 133 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.36 | loss 4.48 | ppl 88.34 | epoch 133 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.10 | loss 4.45 | ppl 85.66 | epoch 133 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.83 | loss 4.44 | ppl 85.10 | epoch 133 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.33 | loss 4.40 | ppl 81.83 ----------------------------------------------------------------------------------------- | end of epoch 133 | time: 513.44s | valid loss 4.81 | valid ppl 122.77 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 134 | 200/ 1568 batches | lr 30.00 | ms/batch 301.05 | loss 4.47 | ppl 87.12 | epoch 134 | 400/ 1568 batches | lr 30.00 | ms/batch 294.41 | loss 4.44 | ppl 85.16 | epoch 134 | 600/ 1568 batches | lr 30.00 | ms/batch 301.64 | loss 4.43 | ppl 84.15 | epoch 134 | 800/ 1568 batches | lr 30.00 | ms/batch 300.20 | loss 4.46 | ppl 86.24 | epoch 134 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.50 | ppl 90.24 | epoch 134 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.78 | loss 4.44 | ppl 85.17 | epoch 134 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.40 | loss 4.43 | ppl 84.12 | epoch 134 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.73 | loss 4.42 | ppl 82.74 ----------------------------------------------------------------------------------------- | end of epoch 134 | time: 513.54s | valid loss 4.81 | valid ppl 122.70 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 135 | 200/ 1568 batches | lr 30.00 | ms/batch 303.50 | loss 4.49 | ppl 88.93 | epoch 135 | 400/ 1568 batches | lr 30.00 | ms/batch 300.67 | loss 4.46 | ppl 86.07 | epoch 135 | 600/ 1568 batches | lr 30.00 | ms/batch 304.79 | loss 4.44 | ppl 85.15 | epoch 135 | 800/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.45 | ppl 85.66 | epoch 135 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.88 | loss 4.49 | ppl 89.42 | epoch 135 | 1200/ 1568 batches | lr 30.00 | ms/batch 308.58 | loss 4.43 | ppl 84.23 | epoch 135 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.46 | loss 4.42 | ppl 83.48 | epoch 135 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.85 | loss 4.40 | ppl 81.40 ----------------------------------------------------------------------------------------- | end of epoch 135 | time: 513.45s | valid loss 4.81 | valid ppl 122.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 136 | 200/ 1568 batches | lr 30.00 | ms/batch 304.68 | loss 4.48 | ppl 87.93 | epoch 136 | 400/ 1568 batches | lr 30.00 | ms/batch 304.88 | loss 4.45 | ppl 85.27 | epoch 136 | 600/ 1568 batches | lr 30.00 | ms/batch 299.50 | loss 4.44 | ppl 84.44 | epoch 136 | 800/ 1568 batches | lr 30.00 | ms/batch 301.31 | loss 4.44 | ppl 84.88 | epoch 136 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.20 | loss 4.50 | ppl 89.68 | epoch 136 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.97 | loss 4.44 | ppl 84.64 | epoch 136 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.43 | ppl 83.68 | epoch 136 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.92 | loss 4.41 | ppl 81.94 ----------------------------------------------------------------------------------------- | end of epoch 136 | time: 513.42s | valid loss 4.81 | valid ppl 122.58 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 137 | 200/ 1568 batches | lr 30.00 | ms/batch 298.21 | loss 4.46 | ppl 86.52 | epoch 137 | 400/ 1568 batches | lr 30.00 | ms/batch 300.86 | loss 4.45 | ppl 85.43 | epoch 137 | 600/ 1568 batches | lr 30.00 | ms/batch 300.87 | loss 4.43 | ppl 83.67 | epoch 137 | 800/ 1568 batches | lr 30.00 | ms/batch 303.89 | loss 4.44 | ppl 84.52 | epoch 137 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.49 | ppl 88.71 | epoch 137 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.95 | loss 4.41 | ppl 82.62 | epoch 137 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.02 | loss 4.42 | ppl 83.50 | epoch 137 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.90 | loss 4.40 | ppl 81.70 ----------------------------------------------------------------------------------------- | end of epoch 137 | time: 513.60s | valid loss 4.81 | valid ppl 122.52 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 138 | 200/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 4.46 | ppl 86.87 | epoch 138 | 400/ 1568 batches | lr 30.00 | ms/batch 298.08 | loss 4.44 | ppl 84.82 | epoch 138 | 600/ 1568 batches | lr 30.00 | ms/batch 308.11 | loss 4.42 | ppl 82.86 | epoch 138 | 800/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.43 | ppl 84.32 | epoch 138 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.48 | ppl 87.96 | epoch 138 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.41 | loss 4.43 | ppl 83.83 | epoch 138 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.33 | loss 4.43 | ppl 83.64 | epoch 138 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.73 | loss 4.39 | ppl 80.55 ----------------------------------------------------------------------------------------- | end of epoch 138 | time: 514.32s | valid loss 4.81 | valid ppl 122.46 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 139 | 200/ 1568 batches | lr 30.00 | ms/batch 307.97 | loss 4.46 | ppl 86.60 | epoch 139 | 400/ 1568 batches | lr 30.00 | ms/batch 300.27 | loss 4.44 | ppl 85.20 | epoch 139 | 600/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.42 | ppl 83.49 | epoch 139 | 800/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.44 | ppl 84.37 | epoch 139 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.95 | loss 4.49 | ppl 89.21 | epoch 139 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.17 | loss 4.43 | ppl 84.03 | epoch 139 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.42 | ppl 83.35 | epoch 139 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.75 | loss 4.40 | ppl 81.55 ----------------------------------------------------------------------------------------- | end of epoch 139 | time: 513.67s | valid loss 4.81 | valid ppl 122.40 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 140 | 200/ 1568 batches | lr 30.00 | ms/batch 307.37 | loss 4.45 | ppl 85.84 | epoch 140 | 400/ 1568 batches | lr 30.00 | ms/batch 297.12 | loss 4.44 | ppl 84.80 | epoch 140 | 600/ 1568 batches | lr 30.00 | ms/batch 304.58 | loss 4.43 | ppl 83.81 | epoch 140 | 800/ 1568 batches | lr 30.00 | ms/batch 301.31 | loss 4.43 | ppl 84.28 | epoch 140 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.32 | loss 4.46 | ppl 86.49 | epoch 140 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.26 | loss 4.42 | ppl 83.01 | epoch 140 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.20 | loss 4.40 | ppl 81.82 | epoch 140 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.17 | loss 4.40 | ppl 81.22 ----------------------------------------------------------------------------------------- | end of epoch 140 | time: 513.87s | valid loss 4.81 | valid ppl 122.35 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 141 | 200/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.46 | ppl 86.07 | epoch 141 | 400/ 1568 batches | lr 30.00 | ms/batch 299.42 | loss 4.42 | ppl 83.49 | epoch 141 | 600/ 1568 batches | lr 30.00 | ms/batch 298.74 | loss 4.42 | ppl 83.00 | epoch 141 | 800/ 1568 batches | lr 30.00 | ms/batch 301.86 | loss 4.41 | ppl 82.57 | epoch 141 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.14 | loss 4.48 | ppl 87.81 | epoch 141 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.76 | loss 4.43 | ppl 83.68 | epoch 141 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.92 | loss 4.41 | ppl 81.99 | epoch 141 | 1600/ 1568 batches | lr 30.00 | ms/batch 291.80 | loss 4.40 | ppl 81.61 ----------------------------------------------------------------------------------------- | end of epoch 141 | time: 514.04s | valid loss 4.81 | valid ppl 122.29 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 142 | 200/ 1568 batches | lr 30.00 | ms/batch 301.23 | loss 4.47 | ppl 87.31 | epoch 142 | 400/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.45 | ppl 85.55 | epoch 142 | 600/ 1568 batches | lr 30.00 | ms/batch 299.18 | loss 4.40 | ppl 81.79 | epoch 142 | 800/ 1568 batches | lr 30.00 | ms/batch 294.02 | loss 4.43 | ppl 83.85 | epoch 142 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.00 | loss 4.48 | ppl 88.08 | epoch 142 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.85 | loss 4.40 | ppl 81.30 | epoch 142 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.71 | loss 4.39 | ppl 80.65 | epoch 142 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.64 | loss 4.39 | ppl 80.94 ----------------------------------------------------------------------------------------- | end of epoch 142 | time: 514.16s | valid loss 4.81 | valid ppl 122.24 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 143 | 200/ 1568 batches | lr 30.00 | ms/batch 301.56 | loss 4.45 | ppl 86.03 | epoch 143 | 400/ 1568 batches | lr 30.00 | ms/batch 300.00 | loss 4.43 | ppl 83.68 | epoch 143 | 600/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.42 | ppl 83.47 | epoch 143 | 800/ 1568 batches | lr 30.00 | ms/batch 302.26 | loss 4.42 | ppl 83.15 | epoch 143 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.47 | ppl 87.40 | epoch 143 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.44 | loss 4.43 | ppl 83.62 | epoch 143 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.66 | loss 4.39 | ppl 80.97 | epoch 143 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.39 | ppl 80.65 ----------------------------------------------------------------------------------------- | end of epoch 143 | time: 513.92s | valid loss 4.81 | valid ppl 122.19 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 144 | 200/ 1568 batches | lr 30.00 | ms/batch 299.30 | loss 4.46 | ppl 86.20 | epoch 144 | 400/ 1568 batches | lr 30.00 | ms/batch 297.34 | loss 4.43 | ppl 84.21 | epoch 144 | 600/ 1568 batches | lr 30.00 | ms/batch 299.44 | loss 4.40 | ppl 81.64 | epoch 144 | 800/ 1568 batches | lr 30.00 | ms/batch 301.16 | loss 4.39 | ppl 80.90 | epoch 144 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.89 | loss 4.46 | ppl 86.63 | epoch 144 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.41 | ppl 82.39 | epoch 144 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.51 | loss 4.39 | ppl 80.24 | epoch 144 | 1600/ 1568 batches | lr 30.00 | ms/batch 309.12 | loss 4.40 | ppl 81.40 ----------------------------------------------------------------------------------------- | end of epoch 144 | time: 513.83s | valid loss 4.81 | valid ppl 122.14 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 145 | 200/ 1568 batches | lr 30.00 | ms/batch 296.94 | loss 4.44 | ppl 84.59 | epoch 145 | 400/ 1568 batches | lr 30.00 | ms/batch 303.40 | loss 4.44 | ppl 85.09 | epoch 145 | 600/ 1568 batches | lr 30.00 | ms/batch 302.65 | loss 4.42 | ppl 83.07 | epoch 145 | 800/ 1568 batches | lr 30.00 | ms/batch 299.07 | loss 4.42 | ppl 82.96 | epoch 145 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.21 | loss 4.47 | ppl 87.23 | epoch 145 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.68 | loss 4.41 | ppl 82.43 | epoch 145 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.82 | loss 4.40 | ppl 81.17 | epoch 145 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.25 | loss 4.38 | ppl 79.52 ----------------------------------------------------------------------------------------- | end of epoch 145 | time: 514.12s | valid loss 4.80 | valid ppl 122.08 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 146 | 200/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.45 | ppl 85.98 | epoch 146 | 400/ 1568 batches | lr 30.00 | ms/batch 294.75 | loss 4.43 | ppl 83.62 | epoch 146 | 600/ 1568 batches | lr 30.00 | ms/batch 304.35 | loss 4.40 | ppl 81.55 | epoch 146 | 800/ 1568 batches | lr 30.00 | ms/batch 302.55 | loss 4.41 | ppl 82.65 | epoch 146 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.89 | loss 4.48 | ppl 88.04 | epoch 146 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.12 | loss 4.41 | ppl 82.25 | epoch 146 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.38 | ppl 80.08 | epoch 146 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.48 | loss 4.36 | ppl 78.63 ----------------------------------------------------------------------------------------- | end of epoch 146 | time: 514.10s | valid loss 4.80 | valid ppl 122.03 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 147 | 200/ 1568 batches | lr 30.00 | ms/batch 298.19 | loss 4.45 | ppl 85.25 | epoch 147 | 400/ 1568 batches | lr 30.00 | ms/batch 297.75 | loss 4.43 | ppl 83.64 | epoch 147 | 600/ 1568 batches | lr 30.00 | ms/batch 297.10 | loss 4.41 | ppl 82.50 | epoch 147 | 800/ 1568 batches | lr 30.00 | ms/batch 305.48 | loss 4.40 | ppl 81.77 | epoch 147 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.11 | loss 4.47 | ppl 87.48 | epoch 147 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.08 | loss 4.40 | ppl 81.37 | epoch 147 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.59 | loss 4.38 | ppl 80.08 | epoch 147 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.98 | loss 4.38 | ppl 79.53 ----------------------------------------------------------------------------------------- | end of epoch 147 | time: 513.95s | valid loss 4.80 | valid ppl 121.98 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 148 | 200/ 1568 batches | lr 30.00 | ms/batch 297.51 | loss 4.45 | ppl 86.00 | epoch 148 | 400/ 1568 batches | lr 30.00 | ms/batch 302.37 | loss 4.44 | ppl 84.71 | epoch 148 | 600/ 1568 batches | lr 30.00 | ms/batch 296.81 | loss 4.38 | ppl 79.72 | epoch 148 | 800/ 1568 batches | lr 30.00 | ms/batch 298.71 | loss 4.40 | ppl 81.63 | epoch 148 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.27 | loss 4.47 | ppl 87.62 | epoch 148 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.39 | ppl 80.98 | epoch 148 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.90 | loss 4.40 | ppl 81.25 | epoch 148 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.37 | ppl 79.37 ----------------------------------------------------------------------------------------- | end of epoch 148 | time: 513.98s | valid loss 4.80 | valid ppl 121.94 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 149 | 200/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.44 | ppl 85.09 | epoch 149 | 400/ 1568 batches | lr 30.00 | ms/batch 305.17 | loss 4.41 | ppl 82.14 | epoch 149 | 600/ 1568 batches | lr 30.00 | ms/batch 303.69 | loss 4.40 | ppl 81.63 | epoch 149 | 800/ 1568 batches | lr 30.00 | ms/batch 303.52 | loss 4.42 | ppl 82.82 | epoch 149 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.36 | loss 4.45 | ppl 85.61 | epoch 149 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.26 | loss 4.41 | ppl 81.88 | epoch 149 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.10 | loss 4.39 | ppl 80.71 | epoch 149 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.73 | loss 4.37 | ppl 78.83 ----------------------------------------------------------------------------------------- | end of epoch 149 | time: 513.65s | valid loss 4.80 | valid ppl 121.89 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 150 | 200/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.44 | ppl 84.83 | epoch 150 | 400/ 1568 batches | lr 30.00 | ms/batch 299.16 | loss 4.41 | ppl 82.44 | epoch 150 | 600/ 1568 batches | lr 30.00 | ms/batch 301.10 | loss 4.38 | ppl 79.75 | epoch 150 | 800/ 1568 batches | lr 30.00 | ms/batch 296.83 | loss 4.41 | ppl 82.02 | epoch 150 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.15 | loss 4.46 | ppl 86.30 | epoch 150 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.92 | loss 4.38 | ppl 79.85 | epoch 150 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.56 | loss 4.37 | ppl 79.32 | epoch 150 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.32 | loss 4.37 | ppl 79.13 ----------------------------------------------------------------------------------------- | end of epoch 150 | time: 513.96s | valid loss 4.80 | valid ppl 121.85 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 151 | 200/ 1568 batches | lr 30.00 | ms/batch 300.71 | loss 4.44 | ppl 84.63 | epoch 151 | 400/ 1568 batches | lr 30.00 | ms/batch 303.36 | loss 4.43 | ppl 83.58 | epoch 151 | 600/ 1568 batches | lr 30.00 | ms/batch 304.72 | loss 4.40 | ppl 81.29 | epoch 151 | 800/ 1568 batches | lr 30.00 | ms/batch 297.72 | loss 4.40 | ppl 81.77 | epoch 151 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.45 | ppl 85.56 | epoch 151 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.83 | loss 4.39 | ppl 80.91 | epoch 151 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.21 | loss 4.38 | ppl 79.59 | epoch 151 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.25 | loss 4.36 | ppl 78.06 ----------------------------------------------------------------------------------------- | end of epoch 151 | time: 513.87s | valid loss 4.80 | valid ppl 121.81 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 152 | 200/ 1568 batches | lr 30.00 | ms/batch 303.98 | loss 4.44 | ppl 84.60 | epoch 152 | 400/ 1568 batches | lr 30.00 | ms/batch 302.60 | loss 4.41 | ppl 82.43 | epoch 152 | 600/ 1568 batches | lr 30.00 | ms/batch 299.80 | loss 4.39 | ppl 80.69 | epoch 152 | 800/ 1568 batches | lr 30.00 | ms/batch 298.46 | loss 4.39 | ppl 81.03 | epoch 152 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.25 | loss 4.45 | ppl 85.45 | epoch 152 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.38 | ppl 80.16 | epoch 152 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.64 | loss 4.37 | ppl 79.27 | epoch 152 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.36 | ppl 78.58 ----------------------------------------------------------------------------------------- | end of epoch 152 | time: 514.57s | valid loss 4.80 | valid ppl 121.76 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 153 | 200/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.43 | ppl 84.05 | epoch 153 | 400/ 1568 batches | lr 30.00 | ms/batch 298.80 | loss 4.42 | ppl 82.77 | epoch 153 | 600/ 1568 batches | lr 30.00 | ms/batch 302.19 | loss 4.39 | ppl 80.78 | epoch 153 | 800/ 1568 batches | lr 30.00 | ms/batch 298.19 | loss 4.41 | ppl 82.09 | epoch 153 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.46 | loss 4.44 | ppl 85.13 | epoch 153 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.45 | loss 4.39 | ppl 80.56 | epoch 153 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.37 | ppl 79.41 | epoch 153 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.59 | loss 4.36 | ppl 78.44 ----------------------------------------------------------------------------------------- | end of epoch 153 | time: 514.03s | valid loss 4.80 | valid ppl 121.72 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 154 | 200/ 1568 batches | lr 30.00 | ms/batch 305.73 | loss 4.46 | ppl 86.35 | epoch 154 | 400/ 1568 batches | lr 30.00 | ms/batch 303.35 | loss 4.41 | ppl 82.12 | epoch 154 | 600/ 1568 batches | lr 30.00 | ms/batch 299.69 | loss 4.38 | ppl 79.68 | epoch 154 | 800/ 1568 batches | lr 30.00 | ms/batch 303.11 | loss 4.40 | ppl 81.56 | epoch 154 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.09 | loss 4.45 | ppl 85.23 | epoch 154 | 1200/ 1568 batches | lr 30.00 | ms/batch 294.46 | loss 4.39 | ppl 80.98 | epoch 154 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.37 | ppl 79.30 | epoch 154 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.35 | loss 4.36 | ppl 78.19 ----------------------------------------------------------------------------------------- | end of epoch 154 | time: 513.96s | valid loss 4.80 | valid ppl 121.67 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 155 | 200/ 1568 batches | lr 30.00 | ms/batch 303.98 | loss 4.44 | ppl 84.63 | epoch 155 | 400/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.40 | ppl 81.86 | epoch 155 | 600/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.37 | ppl 79.16 | epoch 155 | 800/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.38 | ppl 79.98 | epoch 155 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.79 | loss 4.46 | ppl 86.72 | epoch 155 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.82 | loss 4.38 | ppl 80.15 | epoch 155 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.36 | ppl 78.65 | epoch 155 | 1600/ 1568 batches | lr 30.00 | ms/batch 307.46 | loss 4.36 | ppl 78.32 ----------------------------------------------------------------------------------------- | end of epoch 155 | time: 514.86s | valid loss 4.80 | valid ppl 121.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 156 | 200/ 1568 batches | lr 30.00 | ms/batch 304.50 | loss 4.43 | ppl 83.80 | epoch 156 | 400/ 1568 batches | lr 30.00 | ms/batch 298.66 | loss 4.40 | ppl 81.85 | epoch 156 | 600/ 1568 batches | lr 30.00 | ms/batch 299.37 | loss 4.36 | ppl 78.61 | epoch 156 | 800/ 1568 batches | lr 30.00 | ms/batch 304.30 | loss 4.39 | ppl 80.60 | epoch 156 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.56 | loss 4.43 | ppl 84.07 | epoch 156 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.37 | ppl 78.92 | epoch 156 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.44 | loss 4.36 | ppl 78.33 | epoch 156 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.50 | loss 4.34 | ppl 76.46 ----------------------------------------------------------------------------------------- | end of epoch 156 | time: 513.83s | valid loss 4.80 | valid ppl 121.60 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 157 | 200/ 1568 batches | lr 30.00 | ms/batch 304.72 | loss 4.42 | ppl 83.26 | epoch 157 | 400/ 1568 batches | lr 30.00 | ms/batch 302.81 | loss 4.40 | ppl 81.17 | epoch 157 | 600/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.38 | ppl 79.76 | epoch 157 | 800/ 1568 batches | lr 30.00 | ms/batch 300.26 | loss 4.40 | ppl 81.21 | epoch 157 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.76 | loss 4.44 | ppl 84.37 | epoch 157 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.14 | loss 4.38 | ppl 80.23 | epoch 157 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.37 | ppl 78.98 | epoch 157 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.95 | loss 4.35 | ppl 77.37 ----------------------------------------------------------------------------------------- | end of epoch 157 | time: 513.80s | valid loss 4.80 | valid ppl 121.56 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 158 | 200/ 1568 batches | lr 30.00 | ms/batch 307.75 | loss 4.41 | ppl 82.14 | epoch 158 | 400/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.39 | ppl 80.88 | epoch 158 | 600/ 1568 batches | lr 30.00 | ms/batch 306.22 | loss 4.37 | ppl 78.73 | epoch 158 | 800/ 1568 batches | lr 30.00 | ms/batch 298.90 | loss 4.39 | ppl 80.98 | epoch 158 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.44 | ppl 84.59 | epoch 158 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.16 | loss 4.38 | ppl 80.11 | epoch 158 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.61 | loss 4.37 | ppl 79.13 | epoch 158 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.75 | loss 4.34 | ppl 76.79 ----------------------------------------------------------------------------------------- | end of epoch 158 | time: 514.34s | valid loss 4.80 | valid ppl 121.52 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 159 | 200/ 1568 batches | lr 30.00 | ms/batch 302.26 | loss 4.42 | ppl 83.12 | epoch 159 | 400/ 1568 batches | lr 30.00 | ms/batch 300.92 | loss 4.40 | ppl 81.42 | epoch 159 | 600/ 1568 batches | lr 30.00 | ms/batch 303.78 | loss 4.37 | ppl 78.96 | epoch 159 | 800/ 1568 batches | lr 30.00 | ms/batch 297.29 | loss 4.37 | ppl 79.06 | epoch 159 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.44 | ppl 85.02 | epoch 159 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.02 | loss 4.37 | ppl 78.87 | epoch 159 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.72 | loss 4.37 | ppl 79.30 | epoch 159 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.27 | loss 4.34 | ppl 76.82 ----------------------------------------------------------------------------------------- | end of epoch 159 | time: 513.60s | valid loss 4.80 | valid ppl 121.48 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 160 | 200/ 1568 batches | lr 30.00 | ms/batch 307.62 | loss 4.43 | ppl 83.84 | epoch 160 | 400/ 1568 batches | lr 30.00 | ms/batch 302.02 | loss 4.39 | ppl 80.93 | epoch 160 | 600/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.37 | ppl 78.82 | epoch 160 | 800/ 1568 batches | lr 30.00 | ms/batch 300.83 | loss 4.39 | ppl 80.98 | epoch 160 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.69 | loss 4.42 | ppl 83.48 | epoch 160 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.29 | loss 4.37 | ppl 79.24 | epoch 160 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.23 | loss 4.35 | ppl 77.44 | epoch 160 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.00 | loss 4.36 | ppl 78.40 ----------------------------------------------------------------------------------------- | end of epoch 160 | time: 513.72s | valid loss 4.80 | valid ppl 121.44 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 161 | 200/ 1568 batches | lr 30.00 | ms/batch 303.84 | loss 4.40 | ppl 81.71 | epoch 161 | 400/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.40 | ppl 81.68 | epoch 161 | 600/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.36 | ppl 78.18 | epoch 161 | 800/ 1568 batches | lr 30.00 | ms/batch 298.94 | loss 4.39 | ppl 80.38 | epoch 161 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.47 | loss 4.44 | ppl 84.46 | epoch 161 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.93 | loss 4.36 | ppl 78.63 | epoch 161 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.97 | loss 4.36 | ppl 78.31 | epoch 161 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.72 | loss 4.35 | ppl 77.19 ----------------------------------------------------------------------------------------- | end of epoch 161 | time: 513.76s | valid loss 4.80 | valid ppl 121.41 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 162 | 200/ 1568 batches | lr 30.00 | ms/batch 303.45 | loss 4.40 | ppl 81.65 | epoch 162 | 400/ 1568 batches | lr 30.00 | ms/batch 302.96 | loss 4.39 | ppl 80.89 | epoch 162 | 600/ 1568 batches | lr 30.00 | ms/batch 306.87 | loss 4.35 | ppl 77.76 | epoch 162 | 800/ 1568 batches | lr 30.00 | ms/batch 306.04 | loss 4.39 | ppl 80.71 | epoch 162 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.42 | ppl 83.03 | epoch 162 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.38 | ppl 80.15 | epoch 162 | 1400/ 1568 batches | lr 30.00 | ms/batch 308.59 | loss 4.36 | ppl 78.19 | epoch 162 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.90 | loss 4.34 | ppl 76.36 ----------------------------------------------------------------------------------------- | end of epoch 162 | time: 513.65s | valid loss 4.80 | valid ppl 121.38 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 163 | 200/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.41 | ppl 82.26 | epoch 163 | 400/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.38 | ppl 80.01 | epoch 163 | 600/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 4.36 | ppl 78.44 | epoch 163 | 800/ 1568 batches | lr 30.00 | ms/batch 300.24 | loss 4.38 | ppl 79.72 | epoch 163 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.34 | loss 4.45 | ppl 85.23 | epoch 163 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.24 | loss 4.37 | ppl 79.34 | epoch 163 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.55 | loss 4.36 | ppl 78.15 | epoch 163 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.14 | loss 4.35 | ppl 77.42 ----------------------------------------------------------------------------------------- | end of epoch 163 | time: 513.87s | valid loss 4.80 | valid ppl 121.34 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 164 | 200/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.41 | ppl 82.38 | epoch 164 | 400/ 1568 batches | lr 30.00 | ms/batch 298.65 | loss 4.38 | ppl 79.80 | epoch 164 | 600/ 1568 batches | lr 30.00 | ms/batch 302.59 | loss 4.36 | ppl 78.45 | epoch 164 | 800/ 1568 batches | lr 30.00 | ms/batch 300.17 | loss 4.37 | ppl 78.95 | epoch 164 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.60 | loss 4.42 | ppl 83.03 | epoch 164 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.00 | loss 4.36 | ppl 78.11 | epoch 164 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.56 | loss 4.35 | ppl 77.76 | epoch 164 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.81 | loss 4.33 | ppl 76.14 ----------------------------------------------------------------------------------------- | end of epoch 164 | time: 514.13s | valid loss 4.80 | valid ppl 121.31 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 165 | 200/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.40 | ppl 81.54 | epoch 165 | 400/ 1568 batches | lr 30.00 | ms/batch 301.18 | loss 4.39 | ppl 80.27 | epoch 165 | 600/ 1568 batches | lr 30.00 | ms/batch 298.28 | loss 4.36 | ppl 77.88 | epoch 165 | 800/ 1568 batches | lr 30.00 | ms/batch 299.98 | loss 4.36 | ppl 78.65 | epoch 165 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.30 | loss 4.43 | ppl 84.15 | epoch 165 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.56 | loss 4.35 | ppl 77.82 | epoch 165 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.77 | loss 4.34 | ppl 77.08 | epoch 165 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.41 | loss 4.36 | ppl 78.07 ----------------------------------------------------------------------------------------- | end of epoch 165 | time: 513.79s | valid loss 4.80 | valid ppl 121.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 166 | 200/ 1568 batches | lr 30.00 | ms/batch 298.76 | loss 4.39 | ppl 80.32 | epoch 166 | 400/ 1568 batches | lr 30.00 | ms/batch 302.23 | loss 4.38 | ppl 80.12 | epoch 166 | 600/ 1568 batches | lr 30.00 | ms/batch 303.57 | loss 4.36 | ppl 77.97 | epoch 166 | 800/ 1568 batches | lr 30.00 | ms/batch 305.64 | loss 4.37 | ppl 78.87 | epoch 166 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.64 | loss 4.41 | ppl 82.37 | epoch 166 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.01 | loss 4.36 | ppl 78.64 | epoch 166 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.34 | ppl 76.90 | epoch 166 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.89 | loss 4.33 | ppl 76.28 ----------------------------------------------------------------------------------------- | end of epoch 166 | time: 513.83s | valid loss 4.80 | valid ppl 121.24 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 167 | 200/ 1568 batches | lr 30.00 | ms/batch 297.71 | loss 4.40 | ppl 81.39 | epoch 167 | 400/ 1568 batches | lr 30.00 | ms/batch 300.86 | loss 4.37 | ppl 79.20 | epoch 167 | 600/ 1568 batches | lr 30.00 | ms/batch 303.41 | loss 4.34 | ppl 77.04 | epoch 167 | 800/ 1568 batches | lr 30.00 | ms/batch 301.27 | loss 4.36 | ppl 78.48 | epoch 167 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.43 | ppl 83.71 | epoch 167 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.37 | ppl 79.14 | epoch 167 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.12 | loss 4.35 | ppl 77.72 | epoch 167 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.57 | loss 4.35 | ppl 77.67 ----------------------------------------------------------------------------------------- | end of epoch 167 | time: 513.87s | valid loss 4.80 | valid ppl 121.21 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 168 | 200/ 1568 batches | lr 30.00 | ms/batch 301.14 | loss 4.40 | ppl 81.48 | epoch 168 | 400/ 1568 batches | lr 30.00 | ms/batch 295.21 | loss 4.38 | ppl 80.21 | epoch 168 | 600/ 1568 batches | lr 30.00 | ms/batch 297.31 | loss 4.35 | ppl 77.16 | epoch 168 | 800/ 1568 batches | lr 30.00 | ms/batch 300.71 | loss 4.36 | ppl 77.96 | epoch 168 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.94 | loss 4.42 | ppl 82.87 | epoch 168 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.00 | loss 4.36 | ppl 78.09 | epoch 168 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.34 | ppl 76.34 | epoch 168 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.36 | ppl 78.39 ----------------------------------------------------------------------------------------- | end of epoch 168 | time: 514.39s | valid loss 4.80 | valid ppl 121.19 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 169 | 200/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.39 | ppl 80.38 | epoch 169 | 400/ 1568 batches | lr 30.00 | ms/batch 303.76 | loss 4.38 | ppl 79.99 | epoch 169 | 600/ 1568 batches | lr 30.00 | ms/batch 301.34 | loss 4.35 | ppl 77.44 | epoch 169 | 800/ 1568 batches | lr 30.00 | ms/batch 298.08 | loss 4.36 | ppl 78.11 | epoch 169 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.17 | loss 4.41 | ppl 82.40 | epoch 169 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.35 | loss 4.35 | ppl 77.85 | epoch 169 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.71 | loss 4.34 | ppl 76.55 | epoch 169 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.32 | ppl 75.36 ----------------------------------------------------------------------------------------- | end of epoch 169 | time: 513.83s | valid loss 4.80 | valid ppl 121.16 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 170 | 200/ 1568 batches | lr 30.00 | ms/batch 305.61 | loss 4.39 | ppl 80.77 | epoch 170 | 400/ 1568 batches | lr 30.00 | ms/batch 296.69 | loss 4.36 | ppl 78.22 | epoch 170 | 600/ 1568 batches | lr 30.00 | ms/batch 301.50 | loss 4.35 | ppl 77.58 | epoch 170 | 800/ 1568 batches | lr 30.00 | ms/batch 303.33 | loss 4.35 | ppl 77.85 | epoch 170 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.15 | loss 4.43 | ppl 83.61 | epoch 170 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.35 | loss 4.35 | ppl 77.17 | epoch 170 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.04 | loss 4.33 | ppl 76.28 | epoch 170 | 1600/ 1568 batches | lr 30.00 | ms/batch 296.89 | loss 4.31 | ppl 74.55 ----------------------------------------------------------------------------------------- | end of epoch 170 | time: 514.14s | valid loss 4.80 | valid ppl 121.13 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 171 | 200/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.39 | ppl 80.82 | epoch 171 | 400/ 1568 batches | lr 30.00 | ms/batch 299.47 | loss 4.37 | ppl 79.35 | epoch 171 | 600/ 1568 batches | lr 30.00 | ms/batch 296.98 | loss 4.35 | ppl 77.63 | epoch 171 | 800/ 1568 batches | lr 30.00 | ms/batch 300.10 | loss 4.35 | ppl 77.13 | epoch 171 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.09 | loss 4.40 | ppl 81.48 | epoch 171 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.42 | loss 4.34 | ppl 76.60 | epoch 171 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.34 | ppl 76.70 | epoch 171 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.32 | ppl 75.05 ----------------------------------------------------------------------------------------- | end of epoch 171 | time: 514.25s | valid loss 4.80 | valid ppl 121.10 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 172 | 200/ 1568 batches | lr 30.00 | ms/batch 299.36 | loss 4.39 | ppl 80.36 | epoch 172 | 400/ 1568 batches | lr 30.00 | ms/batch 299.73 | loss 4.37 | ppl 78.79 | epoch 172 | 600/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.34 | ppl 76.86 | epoch 172 | 800/ 1568 batches | lr 30.00 | ms/batch 297.35 | loss 4.36 | ppl 78.29 | epoch 172 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.46 | loss 4.41 | ppl 82.48 | epoch 172 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.58 | loss 4.35 | ppl 77.29 | epoch 172 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.02 | loss 4.33 | ppl 75.76 | epoch 172 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.08 | loss 4.33 | ppl 76.21 ----------------------------------------------------------------------------------------- | end of epoch 172 | time: 514.16s | valid loss 4.80 | valid ppl 121.07 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 173 | 200/ 1568 batches | lr 30.00 | ms/batch 304.01 | loss 4.39 | ppl 80.87 | epoch 173 | 400/ 1568 batches | lr 30.00 | ms/batch 303.68 | loss 4.36 | ppl 78.43 | epoch 173 | 600/ 1568 batches | lr 30.00 | ms/batch 300.04 | loss 4.36 | ppl 77.96 | epoch 173 | 800/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.36 | ppl 78.37 | epoch 173 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.03 | loss 4.39 | ppl 80.91 | epoch 173 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.28 | loss 4.35 | ppl 77.57 | epoch 173 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.35 | ppl 77.56 | epoch 173 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.42 | loss 4.33 | ppl 75.78 ----------------------------------------------------------------------------------------- | end of epoch 173 | time: 513.58s | valid loss 4.80 | valid ppl 121.04 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 174 | 200/ 1568 batches | lr 30.00 | ms/batch 303.95 | loss 4.39 | ppl 80.96 | epoch 174 | 400/ 1568 batches | lr 30.00 | ms/batch 300.86 | loss 4.38 | ppl 80.01 | epoch 174 | 600/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.33 | ppl 75.78 | epoch 174 | 800/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.36 | ppl 78.17 | epoch 174 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.20 | loss 4.40 | ppl 81.48 | epoch 174 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.75 | loss 4.36 | ppl 78.13 | epoch 174 | 1400/ 1568 batches | lr 30.00 | ms/batch 306.69 | loss 4.34 | ppl 76.65 | epoch 174 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.24 | loss 4.32 | ppl 75.13 ----------------------------------------------------------------------------------------- | end of epoch 174 | time: 513.76s | valid loss 4.80 | valid ppl 121.01 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 175 | 200/ 1568 batches | lr 30.00 | ms/batch 309.76 | loss 4.38 | ppl 79.83 | epoch 175 | 400/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.37 | ppl 79.09 | epoch 175 | 600/ 1568 batches | lr 30.00 | ms/batch 304.38 | loss 4.33 | ppl 75.92 | epoch 175 | 800/ 1568 batches | lr 30.00 | ms/batch 298.07 | loss 4.36 | ppl 78.47 | epoch 175 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.61 | loss 4.39 | ppl 80.42 | epoch 175 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.17 | loss 4.35 | ppl 77.24 | epoch 175 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.36 | loss 4.32 | ppl 74.92 | epoch 175 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.31 | ppl 74.34 ----------------------------------------------------------------------------------------- | end of epoch 175 | time: 515.40s | valid loss 4.80 | valid ppl 120.98 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 176 | 200/ 1568 batches | lr 30.00 | ms/batch 303.34 | loss 4.38 | ppl 79.72 | epoch 176 | 400/ 1568 batches | lr 30.00 | ms/batch 299.14 | loss 4.37 | ppl 78.75 | epoch 176 | 600/ 1568 batches | lr 30.00 | ms/batch 300.74 | loss 4.33 | ppl 75.66 | epoch 176 | 800/ 1568 batches | lr 30.00 | ms/batch 298.26 | loss 4.34 | ppl 76.84 | epoch 176 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.22 | loss 4.40 | ppl 81.76 | epoch 176 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.35 | ppl 77.70 | epoch 176 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.21 | loss 4.32 | ppl 75.03 | epoch 176 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.15 | loss 4.32 | ppl 75.34 ----------------------------------------------------------------------------------------- | end of epoch 176 | time: 515.44s | valid loss 4.80 | valid ppl 120.96 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 177 | 200/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.38 | ppl 79.66 | epoch 177 | 400/ 1568 batches | lr 30.00 | ms/batch 298.02 | loss 4.36 | ppl 78.65 | epoch 177 | 600/ 1568 batches | lr 30.00 | ms/batch 303.95 | loss 4.34 | ppl 76.79 | epoch 177 | 800/ 1568 batches | lr 30.00 | ms/batch 304.10 | loss 4.35 | ppl 77.30 | epoch 177 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.18 | loss 4.41 | ppl 82.27 | epoch 177 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.07 | loss 4.36 | ppl 78.29 | epoch 177 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.76 | loss 4.32 | ppl 75.26 | epoch 177 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.48 | loss 4.32 | ppl 74.82 ----------------------------------------------------------------------------------------- | end of epoch 177 | time: 514.83s | valid loss 4.80 | valid ppl 120.93 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 178 | 200/ 1568 batches | lr 30.00 | ms/batch 304.36 | loss 4.37 | ppl 78.95 | epoch 178 | 400/ 1568 batches | lr 30.00 | ms/batch 298.57 | loss 4.37 | ppl 78.65 | epoch 178 | 600/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.32 | ppl 75.25 | epoch 178 | 800/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.34 | ppl 77.08 | epoch 178 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.73 | loss 4.39 | ppl 80.64 | epoch 178 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.72 | loss 4.34 | ppl 76.68 | epoch 178 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.24 | loss 4.33 | ppl 75.89 | epoch 178 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.25 | loss 4.30 | ppl 74.07 ----------------------------------------------------------------------------------------- | end of epoch 178 | time: 514.65s | valid loss 4.79 | valid ppl 120.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 179 | 200/ 1568 batches | lr 30.00 | ms/batch 307.06 | loss 4.39 | ppl 80.49 | epoch 179 | 400/ 1568 batches | lr 30.00 | ms/batch 297.67 | loss 4.36 | ppl 78.04 | epoch 179 | 600/ 1568 batches | lr 30.00 | ms/batch 302.21 | loss 4.34 | ppl 76.43 | epoch 179 | 800/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.34 | ppl 76.84 | epoch 179 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.39 | ppl 80.34 | epoch 179 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.34 | ppl 76.35 | epoch 179 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.10 | loss 4.32 | ppl 75.42 | epoch 179 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.84 | loss 4.32 | ppl 75.41 ----------------------------------------------------------------------------------------- | end of epoch 179 | time: 514.68s | valid loss 4.79 | valid ppl 120.87 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 180 | 200/ 1568 batches | lr 30.00 | ms/batch 306.56 | loss 4.39 | ppl 80.95 | epoch 180 | 400/ 1568 batches | lr 30.00 | ms/batch 299.57 | loss 4.36 | ppl 78.34 | epoch 180 | 600/ 1568 batches | lr 30.00 | ms/batch 304.37 | loss 4.32 | ppl 75.27 | epoch 180 | 800/ 1568 batches | lr 30.00 | ms/batch 305.93 | loss 4.34 | ppl 77.07 | epoch 180 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.34 | loss 4.39 | ppl 80.86 | epoch 180 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.23 | loss 4.34 | ppl 76.38 | epoch 180 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.32 | ppl 75.03 | epoch 180 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.31 | ppl 74.37 ----------------------------------------------------------------------------------------- | end of epoch 180 | time: 515.19s | valid loss 4.79 | valid ppl 120.85 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 181 | 200/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.38 | ppl 79.80 | epoch 181 | 400/ 1568 batches | lr 30.00 | ms/batch 302.86 | loss 4.37 | ppl 79.06 | epoch 181 | 600/ 1568 batches | lr 30.00 | ms/batch 305.30 | loss 4.33 | ppl 76.12 | epoch 181 | 800/ 1568 batches | lr 30.00 | ms/batch 302.16 | loss 4.34 | ppl 76.72 | epoch 181 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.38 | ppl 80.17 | epoch 181 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.35 | loss 4.33 | ppl 76.10 | epoch 181 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.89 | loss 4.33 | ppl 76.06 | epoch 181 | 1600/ 1568 batches | lr 30.00 | ms/batch 307.12 | loss 4.31 | ppl 74.45 ----------------------------------------------------------------------------------------- | end of epoch 181 | time: 514.96s | valid loss 4.79 | valid ppl 120.82 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 182 | 200/ 1568 batches | lr 30.00 | ms/batch 301.31 | loss 4.38 | ppl 80.05 | epoch 182 | 400/ 1568 batches | lr 30.00 | ms/batch 304.66 | loss 4.37 | ppl 78.82 | epoch 182 | 600/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.33 | ppl 75.66 | epoch 182 | 800/ 1568 batches | lr 30.00 | ms/batch 305.85 | loss 4.34 | ppl 76.40 | epoch 182 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.38 | ppl 79.55 | epoch 182 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.43 | loss 4.33 | ppl 75.80 | epoch 182 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.23 | loss 4.33 | ppl 75.60 | epoch 182 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.12 | loss 4.29 | ppl 72.96 ----------------------------------------------------------------------------------------- | end of epoch 182 | time: 514.80s | valid loss 4.79 | valid ppl 120.80 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 183 | 200/ 1568 batches | lr 30.00 | ms/batch 305.59 | loss 4.39 | ppl 80.26 | epoch 183 | 400/ 1568 batches | lr 30.00 | ms/batch 298.71 | loss 4.36 | ppl 77.97 | epoch 183 | 600/ 1568 batches | lr 30.00 | ms/batch 302.30 | loss 4.32 | ppl 74.96 | epoch 183 | 800/ 1568 batches | lr 30.00 | ms/batch 300.81 | loss 4.33 | ppl 75.93 | epoch 183 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.39 | ppl 80.40 | epoch 183 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.07 | loss 4.32 | ppl 75.39 | epoch 183 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.43 | loss 4.33 | ppl 75.93 | epoch 183 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.33 | loss 4.29 | ppl 73.14 ----------------------------------------------------------------------------------------- | end of epoch 183 | time: 515.16s | valid loss 4.79 | valid ppl 120.79 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 184 | 200/ 1568 batches | lr 30.00 | ms/batch 304.85 | loss 4.39 | ppl 80.46 | epoch 184 | 400/ 1568 batches | lr 30.00 | ms/batch 297.45 | loss 4.35 | ppl 77.18 | epoch 184 | 600/ 1568 batches | lr 30.00 | ms/batch 303.99 | loss 4.32 | ppl 75.42 | epoch 184 | 800/ 1568 batches | lr 30.00 | ms/batch 305.00 | loss 4.33 | ppl 76.06 | epoch 184 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.15 | loss 4.38 | ppl 79.51 | epoch 184 | 1200/ 1568 batches | lr 30.00 | ms/batch 307.44 | loss 4.33 | ppl 75.96 | epoch 184 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.71 | loss 4.32 | ppl 74.96 | epoch 184 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.29 | ppl 72.97 ----------------------------------------------------------------------------------------- | end of epoch 184 | time: 514.96s | valid loss 4.79 | valid ppl 120.77 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 185 | 200/ 1568 batches | lr 30.00 | ms/batch 303.99 | loss 4.36 | ppl 78.49 | epoch 185 | 400/ 1568 batches | lr 30.00 | ms/batch 304.96 | loss 4.35 | ppl 77.65 | epoch 185 | 600/ 1568 batches | lr 30.00 | ms/batch 299.00 | loss 4.31 | ppl 74.59 | epoch 185 | 800/ 1568 batches | lr 30.00 | ms/batch 302.00 | loss 4.34 | ppl 76.33 | epoch 185 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.00 | loss 4.38 | ppl 79.62 | epoch 185 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.34 | loss 4.32 | ppl 75.14 | epoch 185 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.93 | loss 4.32 | ppl 74.89 | epoch 185 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.09 | loss 4.28 | ppl 72.10 ----------------------------------------------------------------------------------------- | end of epoch 185 | time: 515.02s | valid loss 4.79 | valid ppl 120.75 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 186 | 200/ 1568 batches | lr 30.00 | ms/batch 306.63 | loss 4.37 | ppl 79.27 | epoch 186 | 400/ 1568 batches | lr 30.00 | ms/batch 303.03 | loss 4.34 | ppl 76.76 | epoch 186 | 600/ 1568 batches | lr 30.00 | ms/batch 301.92 | loss 4.30 | ppl 73.51 | epoch 186 | 800/ 1568 batches | lr 30.00 | ms/batch 304.23 | loss 4.33 | ppl 75.97 | epoch 186 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.65 | loss 4.37 | ppl 79.04 | epoch 186 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.78 | loss 4.33 | ppl 76.25 | epoch 186 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.05 | loss 4.30 | ppl 73.66 | epoch 186 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.28 | ppl 72.54 ----------------------------------------------------------------------------------------- | end of epoch 186 | time: 515.23s | valid loss 4.79 | valid ppl 120.73 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 187 | 200/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.36 | ppl 78.17 | epoch 187 | 400/ 1568 batches | lr 30.00 | ms/batch 302.63 | loss 4.34 | ppl 76.99 | epoch 187 | 600/ 1568 batches | lr 30.00 | ms/batch 302.46 | loss 4.32 | ppl 74.91 | epoch 187 | 800/ 1568 batches | lr 30.00 | ms/batch 307.16 | loss 4.33 | ppl 75.93 | epoch 187 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.53 | loss 4.35 | ppl 77.65 | epoch 187 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.56 | loss 4.33 | ppl 75.89 | epoch 187 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.65 | loss 4.31 | ppl 74.66 | epoch 187 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.50 | loss 4.31 | ppl 74.19 ----------------------------------------------------------------------------------------- | end of epoch 187 | time: 515.29s | valid loss 4.79 | valid ppl 120.71 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 188 | 200/ 1568 batches | lr 30.00 | ms/batch 308.41 | loss 4.36 | ppl 78.28 | epoch 188 | 400/ 1568 batches | lr 30.00 | ms/batch 301.40 | loss 4.35 | ppl 77.19 | epoch 188 | 600/ 1568 batches | lr 30.00 | ms/batch 299.39 | loss 4.30 | ppl 73.94 | epoch 188 | 800/ 1568 batches | lr 30.00 | ms/batch 307.38 | loss 4.34 | ppl 77.03 | epoch 188 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.77 | loss 4.36 | ppl 78.07 | epoch 188 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.51 | loss 4.32 | ppl 75.40 | epoch 188 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.69 | loss 4.31 | ppl 74.30 | epoch 188 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.53 | loss 4.30 | ppl 74.06 ----------------------------------------------------------------------------------------- | end of epoch 188 | time: 514.62s | valid loss 4.79 | valid ppl 120.69 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 189 | 200/ 1568 batches | lr 30.00 | ms/batch 301.84 | loss 4.35 | ppl 77.40 | epoch 189 | 400/ 1568 batches | lr 30.00 | ms/batch 298.83 | loss 4.33 | ppl 75.67 | epoch 189 | 600/ 1568 batches | lr 30.00 | ms/batch 301.64 | loss 4.32 | ppl 75.27 | epoch 189 | 800/ 1568 batches | lr 30.00 | ms/batch 303.39 | loss 4.32 | ppl 75.39 | epoch 189 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.18 | loss 4.38 | ppl 79.83 | epoch 189 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.59 | loss 4.31 | ppl 74.72 | epoch 189 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.19 | loss 4.32 | ppl 74.94 | epoch 189 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.67 | loss 4.30 | ppl 73.89 ----------------------------------------------------------------------------------------- | end of epoch 189 | time: 515.23s | valid loss 4.79 | valid ppl 120.67 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 190 | 200/ 1568 batches | lr 30.00 | ms/batch 305.65 | loss 4.38 | ppl 79.59 | epoch 190 | 400/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.34 | ppl 76.34 | epoch 190 | 600/ 1568 batches | lr 30.00 | ms/batch 294.58 | loss 4.30 | ppl 73.97 | epoch 190 | 800/ 1568 batches | lr 30.00 | ms/batch 299.74 | loss 4.32 | ppl 74.89 | epoch 190 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.99 | loss 4.39 | ppl 80.52 | epoch 190 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.54 | loss 4.31 | ppl 74.40 | epoch 190 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.32 | ppl 75.04 | epoch 190 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.87 | loss 4.29 | ppl 72.91 ----------------------------------------------------------------------------------------- | end of epoch 190 | time: 515.30s | valid loss 4.79 | valid ppl 120.66 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 191 | 200/ 1568 batches | lr 30.00 | ms/batch 306.51 | loss 4.37 | ppl 79.33 | epoch 191 | 400/ 1568 batches | lr 30.00 | ms/batch 298.01 | loss 4.34 | ppl 76.72 | epoch 191 | 600/ 1568 batches | lr 30.00 | ms/batch 297.21 | loss 4.31 | ppl 74.26 | epoch 191 | 800/ 1568 batches | lr 30.00 | ms/batch 303.65 | loss 4.31 | ppl 74.40 | epoch 191 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.65 | loss 4.38 | ppl 79.81 | epoch 191 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.00 | loss 4.33 | ppl 75.58 | epoch 191 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.50 | loss 4.30 | ppl 73.49 | epoch 191 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.55 | loss 4.30 | ppl 73.64 ----------------------------------------------------------------------------------------- | end of epoch 191 | time: 514.78s | valid loss 4.79 | valid ppl 120.64 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 192 | 200/ 1568 batches | lr 30.00 | ms/batch 303.28 | loss 4.36 | ppl 78.35 | epoch 192 | 400/ 1568 batches | lr 30.00 | ms/batch 303.69 | loss 4.34 | ppl 76.69 | epoch 192 | 600/ 1568 batches | lr 30.00 | ms/batch 303.22 | loss 4.32 | ppl 75.01 | epoch 192 | 800/ 1568 batches | lr 30.00 | ms/batch 303.72 | loss 4.34 | ppl 76.48 | epoch 192 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.54 | loss 4.35 | ppl 77.77 | epoch 192 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.60 | loss 4.32 | ppl 74.96 | epoch 192 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.32 | loss 4.31 | ppl 74.42 | epoch 192 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.93 | loss 4.28 | ppl 72.28 ----------------------------------------------------------------------------------------- | end of epoch 192 | time: 514.83s | valid loss 4.79 | valid ppl 120.62 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 193 | 200/ 1568 batches | lr 30.00 | ms/batch 304.03 | loss 4.36 | ppl 78.46 | epoch 193 | 400/ 1568 batches | lr 30.00 | ms/batch 303.16 | loss 4.33 | ppl 75.92 | epoch 193 | 600/ 1568 batches | lr 30.00 | ms/batch 301.88 | loss 4.30 | ppl 73.42 | epoch 193 | 800/ 1568 batches | lr 30.00 | ms/batch 305.87 | loss 4.32 | ppl 75.52 | epoch 193 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.20 | loss 4.35 | ppl 77.80 | epoch 193 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.50 | loss 4.32 | ppl 75.55 | epoch 193 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.00 | loss 4.30 | ppl 73.76 | epoch 193 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.05 | loss 4.30 | ppl 73.34 ----------------------------------------------------------------------------------------- | end of epoch 193 | time: 514.63s | valid loss 4.79 | valid ppl 120.61 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 194 | 200/ 1568 batches | lr 30.00 | ms/batch 303.91 | loss 4.40 | ppl 81.25 | epoch 194 | 400/ 1568 batches | lr 30.00 | ms/batch 298.09 | loss 4.32 | ppl 75.46 | epoch 194 | 600/ 1568 batches | lr 30.00 | ms/batch 303.61 | loss 4.30 | ppl 73.72 | epoch 194 | 800/ 1568 batches | lr 30.00 | ms/batch 304.42 | loss 4.32 | ppl 75.24 | epoch 194 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.84 | loss 4.37 | ppl 78.90 | epoch 194 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.49 | loss 4.32 | ppl 74.99 | epoch 194 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.35 | loss 4.30 | ppl 73.47 | epoch 194 | 1600/ 1568 batches | lr 30.00 | ms/batch 297.14 | loss 4.28 | ppl 72.10 ----------------------------------------------------------------------------------------- | end of epoch 194 | time: 515.16s | valid loss 4.79 | valid ppl 120.59 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 195 | 200/ 1568 batches | lr 30.00 | ms/batch 301.73 | loss 4.36 | ppl 78.55 | epoch 195 | 400/ 1568 batches | lr 30.00 | ms/batch 303.64 | loss 4.32 | ppl 75.48 | epoch 195 | 600/ 1568 batches | lr 30.00 | ms/batch 302.14 | loss 4.30 | ppl 73.40 | epoch 195 | 800/ 1568 batches | lr 30.00 | ms/batch 301.55 | loss 4.32 | ppl 74.89 | epoch 195 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.37 | ppl 78.71 | epoch 195 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.97 | loss 4.32 | ppl 74.89 | epoch 195 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.54 | loss 4.31 | ppl 74.14 | epoch 195 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.88 | loss 4.29 | ppl 72.86 ----------------------------------------------------------------------------------------- | end of epoch 195 | time: 515.01s | valid loss 4.79 | valid ppl 120.57 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 196 | 200/ 1568 batches | lr 30.00 | ms/batch 302.98 | loss 4.36 | ppl 78.16 | epoch 196 | 400/ 1568 batches | lr 30.00 | ms/batch 296.56 | loss 4.32 | ppl 75.45 | epoch 196 | 600/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.30 | ppl 73.89 | epoch 196 | 800/ 1568 batches | lr 30.00 | ms/batch 301.83 | loss 4.31 | ppl 74.53 | epoch 196 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.87 | loss 4.37 | ppl 79.20 | epoch 196 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.90 | loss 4.32 | ppl 75.50 | epoch 196 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.93 | loss 4.30 | ppl 73.44 | epoch 196 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.90 | loss 4.28 | ppl 72.15 ----------------------------------------------------------------------------------------- | end of epoch 196 | time: 514.86s | valid loss 4.79 | valid ppl 120.56 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 197 | 200/ 1568 batches | lr 30.00 | ms/batch 299.43 | loss 4.35 | ppl 77.57 | epoch 197 | 400/ 1568 batches | lr 30.00 | ms/batch 300.83 | loss 4.32 | ppl 75.34 | epoch 197 | 600/ 1568 batches | lr 30.00 | ms/batch 301.11 | loss 4.31 | ppl 74.39 | epoch 197 | 800/ 1568 batches | lr 30.00 | ms/batch 302.84 | loss 4.31 | ppl 74.73 | epoch 197 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.52 | loss 4.38 | ppl 79.69 | epoch 197 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.51 | loss 4.31 | ppl 74.54 | epoch 197 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.70 | loss 4.30 | ppl 73.95 | epoch 197 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.13 | loss 4.29 | ppl 72.70 ----------------------------------------------------------------------------------------- | end of epoch 197 | time: 514.99s | valid loss 4.79 | valid ppl 120.54 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 198 | 200/ 1568 batches | lr 30.00 | ms/batch 305.75 | loss 4.37 | ppl 78.84 | epoch 198 | 400/ 1568 batches | lr 30.00 | ms/batch 300.21 | loss 4.33 | ppl 75.58 | epoch 198 | 600/ 1568 batches | lr 30.00 | ms/batch 303.49 | loss 4.29 | ppl 72.81 | epoch 198 | 800/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.31 | ppl 74.53 | epoch 198 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.12 | loss 4.36 | ppl 78.39 | epoch 198 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.12 | loss 4.32 | ppl 75.14 | epoch 198 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.37 | loss 4.28 | ppl 72.58 | epoch 198 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.29 | ppl 72.96 ----------------------------------------------------------------------------------------- | end of epoch 198 | time: 514.62s | valid loss 4.79 | valid ppl 120.53 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 199 | 200/ 1568 batches | lr 30.00 | ms/batch 301.03 | loss 4.35 | ppl 77.60 | epoch 199 | 400/ 1568 batches | lr 30.00 | ms/batch 301.51 | loss 4.32 | ppl 75.35 | epoch 199 | 600/ 1568 batches | lr 30.00 | ms/batch 301.31 | loss 4.31 | ppl 74.08 | epoch 199 | 800/ 1568 batches | lr 30.00 | ms/batch 302.19 | loss 4.29 | ppl 73.02 | epoch 199 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.44 | loss 4.36 | ppl 78.07 | epoch 199 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.04 | loss 4.31 | ppl 74.14 | epoch 199 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.10 | loss 4.28 | ppl 72.33 | epoch 199 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.84 | loss 4.27 | ppl 71.52 ----------------------------------------------------------------------------------------- | end of epoch 199 | time: 514.80s | valid loss 4.79 | valid ppl 120.51 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 200 | 200/ 1568 batches | lr 30.00 | ms/batch 306.80 | loss 4.37 | ppl 78.74 | epoch 200 | 400/ 1568 batches | lr 30.00 | ms/batch 301.94 | loss 4.32 | ppl 75.45 | epoch 200 | 600/ 1568 batches | lr 30.00 | ms/batch 300.20 | loss 4.28 | ppl 72.58 | epoch 200 | 800/ 1568 batches | lr 30.00 | ms/batch 300.54 | loss 4.30 | ppl 73.75 | epoch 200 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.35 | ppl 77.53 | epoch 200 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.23 | loss 4.29 | ppl 72.88 | epoch 200 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.18 | loss 4.29 | ppl 73.22 | epoch 200 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.28 | ppl 72.36 ----------------------------------------------------------------------------------------- | end of epoch 200 | time: 514.75s | valid loss 4.79 | valid ppl 120.50 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 201 | 200/ 1568 batches | lr 30.00 | ms/batch 304.65 | loss 4.34 | ppl 76.85 | epoch 201 | 400/ 1568 batches | lr 30.00 | ms/batch 300.50 | loss 4.31 | ppl 74.50 | epoch 201 | 600/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.28 | ppl 72.31 | epoch 201 | 800/ 1568 batches | lr 30.00 | ms/batch 298.03 | loss 4.30 | ppl 73.79 | epoch 201 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.81 | loss 4.37 | ppl 78.70 | epoch 201 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.60 | loss 4.31 | ppl 74.23 | epoch 201 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.30 | loss 4.28 | ppl 71.94 | epoch 201 | 1600/ 1568 batches | lr 30.00 | ms/batch 294.34 | loss 4.28 | ppl 72.16 ----------------------------------------------------------------------------------------- | end of epoch 201 | time: 514.93s | valid loss 4.79 | valid ppl 120.49 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 202 | 200/ 1568 batches | lr 30.00 | ms/batch 305.10 | loss 4.36 | ppl 77.87 | epoch 202 | 400/ 1568 batches | lr 30.00 | ms/batch 305.67 | loss 4.32 | ppl 75.37 | epoch 202 | 600/ 1568 batches | lr 30.00 | ms/batch 301.76 | loss 4.28 | ppl 72.09 | epoch 202 | 800/ 1568 batches | lr 30.00 | ms/batch 297.87 | loss 4.31 | ppl 74.69 | epoch 202 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.12 | loss 4.34 | ppl 76.67 | epoch 202 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.63 | loss 4.31 | ppl 74.41 | epoch 202 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.66 | loss 4.28 | ppl 71.95 | epoch 202 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.71 | loss 4.26 | ppl 71.02 ----------------------------------------------------------------------------------------- | end of epoch 202 | time: 514.22s | valid loss 4.79 | valid ppl 120.48 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 203 | 200/ 1568 batches | lr 30.00 | ms/batch 305.33 | loss 4.34 | ppl 76.43 | epoch 203 | 400/ 1568 batches | lr 30.00 | ms/batch 293.78 | loss 4.33 | ppl 76.10 | epoch 203 | 600/ 1568 batches | lr 30.00 | ms/batch 300.23 | loss 4.28 | ppl 71.94 | epoch 203 | 800/ 1568 batches | lr 30.00 | ms/batch 299.57 | loss 4.31 | ppl 74.57 | epoch 203 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.81 | loss 4.36 | ppl 78.48 | epoch 203 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.49 | loss 4.31 | ppl 74.32 | epoch 203 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.75 | loss 4.28 | ppl 72.56 | epoch 203 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.01 | loss 4.27 | ppl 71.18 ----------------------------------------------------------------------------------------- | end of epoch 203 | time: 514.53s | valid loss 4.79 | valid ppl 120.46 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 204 | 200/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.35 | ppl 77.24 | epoch 204 | 400/ 1568 batches | lr 30.00 | ms/batch 302.22 | loss 4.34 | ppl 76.54 | epoch 204 | 600/ 1568 batches | lr 30.00 | ms/batch 298.36 | loss 4.28 | ppl 71.99 | epoch 204 | 800/ 1568 batches | lr 30.00 | ms/batch 304.74 | loss 4.30 | ppl 73.47 | epoch 204 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.18 | loss 4.36 | ppl 78.07 | epoch 204 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.67 | loss 4.30 | ppl 73.48 | epoch 204 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.15 | loss 4.29 | ppl 72.69 | epoch 204 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.67 | loss 4.26 | ppl 70.66 ----------------------------------------------------------------------------------------- | end of epoch 204 | time: 515.11s | valid loss 4.79 | valid ppl 120.45 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 205 | 200/ 1568 batches | lr 30.00 | ms/batch 306.54 | loss 4.34 | ppl 76.52 | epoch 205 | 400/ 1568 batches | lr 30.00 | ms/batch 301.08 | loss 4.32 | ppl 75.13 | epoch 205 | 600/ 1568 batches | lr 30.00 | ms/batch 301.52 | loss 4.29 | ppl 72.96 | epoch 205 | 800/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.31 | ppl 74.48 | epoch 205 | 1000/ 1568 batches | lr 30.00 | ms/batch 307.33 | loss 4.34 | ppl 76.47 | epoch 205 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.33 | loss 4.29 | ppl 73.07 | epoch 205 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.82 | loss 4.28 | ppl 72.54 | epoch 205 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.21 | loss 4.24 | ppl 69.12 ----------------------------------------------------------------------------------------- | end of epoch 205 | time: 514.02s | valid loss 4.79 | valid ppl 120.44 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 206 | 200/ 1568 batches | lr 30.00 | ms/batch 302.21 | loss 4.34 | ppl 76.94 | epoch 206 | 400/ 1568 batches | lr 30.00 | ms/batch 304.69 | loss 4.30 | ppl 73.56 | epoch 206 | 600/ 1568 batches | lr 30.00 | ms/batch 299.90 | loss 4.28 | ppl 72.56 | epoch 206 | 800/ 1568 batches | lr 30.00 | ms/batch 305.31 | loss 4.30 | ppl 73.91 | epoch 206 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.90 | loss 4.35 | ppl 77.67 | epoch 206 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.73 | loss 4.31 | ppl 74.36 | epoch 206 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.29 | ppl 72.61 | epoch 206 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.27 | ppl 71.57 ----------------------------------------------------------------------------------------- | end of epoch 206 | time: 514.79s | valid loss 4.79 | valid ppl 120.43 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 207 | 200/ 1568 batches | lr 30.00 | ms/batch 306.54 | loss 4.34 | ppl 76.52 | epoch 207 | 400/ 1568 batches | lr 30.00 | ms/batch 297.89 | loss 4.33 | ppl 75.84 | epoch 207 | 600/ 1568 batches | lr 30.00 | ms/batch 300.25 | loss 4.29 | ppl 72.94 | epoch 207 | 800/ 1568 batches | lr 30.00 | ms/batch 303.04 | loss 4.29 | ppl 73.00 | epoch 207 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.71 | loss 4.34 | ppl 76.78 | epoch 207 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.13 | loss 4.29 | ppl 73.08 | epoch 207 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.21 | loss 4.28 | ppl 72.15 | epoch 207 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.14 | loss 4.28 | ppl 72.45 ----------------------------------------------------------------------------------------- | end of epoch 207 | time: 514.63s | valid loss 4.79 | valid ppl 120.42 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 208 | 200/ 1568 batches | lr 30.00 | ms/batch 304.12 | loss 4.34 | ppl 76.92 | epoch 208 | 400/ 1568 batches | lr 30.00 | ms/batch 300.01 | loss 4.33 | ppl 75.99 | epoch 208 | 600/ 1568 batches | lr 30.00 | ms/batch 297.66 | loss 4.28 | ppl 72.50 | epoch 208 | 800/ 1568 batches | lr 30.00 | ms/batch 301.62 | loss 4.28 | ppl 72.06 | epoch 208 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.35 | ppl 77.46 | epoch 208 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.88 | loss 4.32 | ppl 74.92 | epoch 208 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.32 | loss 4.27 | ppl 71.34 | epoch 208 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.86 | loss 4.28 | ppl 72.02 ----------------------------------------------------------------------------------------- | end of epoch 208 | time: 514.81s | valid loss 4.79 | valid ppl 120.41 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 209 | 200/ 1568 batches | lr 30.00 | ms/batch 302.66 | loss 4.36 | ppl 78.19 | epoch 209 | 400/ 1568 batches | lr 30.00 | ms/batch 302.58 | loss 4.29 | ppl 73.30 | epoch 209 | 600/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.30 | ppl 73.63 | epoch 209 | 800/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.31 | ppl 74.15 | epoch 209 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.05 | loss 4.35 | ppl 77.22 | epoch 209 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.59 | loss 4.31 | ppl 74.29 | epoch 209 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.11 | loss 4.27 | ppl 71.42 | epoch 209 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.80 | loss 4.26 | ppl 70.63 ----------------------------------------------------------------------------------------- | end of epoch 209 | time: 514.79s | valid loss 4.79 | valid ppl 120.40 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 210 | 200/ 1568 batches | lr 30.00 | ms/batch 300.90 | loss 4.33 | ppl 76.00 | epoch 210 | 400/ 1568 batches | lr 30.00 | ms/batch 302.11 | loss 4.31 | ppl 74.74 | epoch 210 | 600/ 1568 batches | lr 30.00 | ms/batch 298.74 | loss 4.29 | ppl 73.30 | epoch 210 | 800/ 1568 batches | lr 30.00 | ms/batch 299.83 | loss 4.30 | ppl 73.68 | epoch 210 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.87 | loss 4.33 | ppl 76.08 | epoch 210 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.81 | loss 4.29 | ppl 73.14 | epoch 210 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.41 | loss 4.30 | ppl 73.35 | epoch 210 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.10 | loss 4.27 | ppl 71.78 ----------------------------------------------------------------------------------------- | end of epoch 210 | time: 515.04s | valid loss 4.79 | valid ppl 120.39 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 211 | 200/ 1568 batches | lr 30.00 | ms/batch 303.75 | loss 4.34 | ppl 76.81 | epoch 211 | 400/ 1568 batches | lr 30.00 | ms/batch 299.14 | loss 4.31 | ppl 74.71 | epoch 211 | 600/ 1568 batches | lr 30.00 | ms/batch 297.58 | loss 4.26 | ppl 71.12 | epoch 211 | 800/ 1568 batches | lr 30.00 | ms/batch 302.20 | loss 4.28 | ppl 72.04 | epoch 211 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.43 | loss 4.33 | ppl 76.30 | epoch 211 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.22 | loss 4.30 | ppl 73.63 | epoch 211 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.28 | ppl 71.93 | epoch 211 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.26 | ppl 70.73 ----------------------------------------------------------------------------------------- | end of epoch 211 | time: 514.45s | valid loss 4.79 | valid ppl 120.39 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 212 | 200/ 1568 batches | lr 30.00 | ms/batch 303.11 | loss 4.34 | ppl 76.86 | epoch 212 | 400/ 1568 batches | lr 30.00 | ms/batch 299.18 | loss 4.30 | ppl 73.61 | epoch 212 | 600/ 1568 batches | lr 30.00 | ms/batch 304.18 | loss 4.28 | ppl 72.28 | epoch 212 | 800/ 1568 batches | lr 30.00 | ms/batch 304.79 | loss 4.28 | ppl 72.46 | epoch 212 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.88 | loss 4.33 | ppl 75.59 | epoch 212 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.74 | loss 4.32 | ppl 74.84 | epoch 212 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.53 | loss 4.29 | ppl 72.67 | epoch 212 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.41 | loss 4.26 | ppl 70.85 ----------------------------------------------------------------------------------------- | end of epoch 212 | time: 514.12s | valid loss 4.79 | valid ppl 120.38 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 213 | 200/ 1568 batches | lr 30.00 | ms/batch 305.78 | loss 4.35 | ppl 77.17 | epoch 213 | 400/ 1568 batches | lr 30.00 | ms/batch 299.37 | loss 4.31 | ppl 74.63 | epoch 213 | 600/ 1568 batches | lr 30.00 | ms/batch 302.46 | loss 4.28 | ppl 72.09 | epoch 213 | 800/ 1568 batches | lr 30.00 | ms/batch 301.80 | loss 4.30 | ppl 73.77 | epoch 213 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.57 | loss 4.33 | ppl 75.68 | epoch 213 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.02 | loss 4.30 | ppl 73.71 | epoch 213 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.72 | loss 4.28 | ppl 72.00 | epoch 213 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.79 | loss 4.27 | ppl 71.19 ----------------------------------------------------------------------------------------- | end of epoch 213 | time: 514.60s | valid loss 4.79 | valid ppl 120.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 214 | 200/ 1568 batches | lr 30.00 | ms/batch 307.45 | loss 4.33 | ppl 76.08 | epoch 214 | 400/ 1568 batches | lr 30.00 | ms/batch 301.60 | loss 4.29 | ppl 72.85 | epoch 214 | 600/ 1568 batches | lr 30.00 | ms/batch 299.99 | loss 4.26 | ppl 71.11 | epoch 214 | 800/ 1568 batches | lr 30.00 | ms/batch 300.51 | loss 4.29 | ppl 72.96 | epoch 214 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.72 | loss 4.34 | ppl 76.51 | epoch 214 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.80 | loss 4.29 | ppl 72.96 | epoch 214 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.39 | loss 4.27 | ppl 71.80 | epoch 214 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.73 | loss 4.25 | ppl 70.24 ----------------------------------------------------------------------------------------- | end of epoch 214 | time: 514.10s | valid loss 4.79 | valid ppl 120.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 215 | 200/ 1568 batches | lr 30.00 | ms/batch 308.32 | loss 4.34 | ppl 76.51 | epoch 215 | 400/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.31 | ppl 74.39 | epoch 215 | 600/ 1568 batches | lr 30.00 | ms/batch 300.00 | loss 4.27 | ppl 71.86 | epoch 215 | 800/ 1568 batches | lr 30.00 | ms/batch 301.15 | loss 4.29 | ppl 72.84 | epoch 215 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.33 | ppl 76.28 | epoch 215 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.63 | loss 4.29 | ppl 72.97 | epoch 215 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.35 | loss 4.27 | ppl 71.27 | epoch 215 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.95 | loss 4.27 | ppl 71.51 ----------------------------------------------------------------------------------------- | end of epoch 215 | time: 514.50s | valid loss 4.79 | valid ppl 120.36 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 216 | 200/ 1568 batches | lr 30.00 | ms/batch 308.71 | loss 4.32 | ppl 75.15 | epoch 216 | 400/ 1568 batches | lr 30.00 | ms/batch 297.24 | loss 4.31 | ppl 74.18 | epoch 216 | 600/ 1568 batches | lr 30.00 | ms/batch 300.76 | loss 4.28 | ppl 72.55 | epoch 216 | 800/ 1568 batches | lr 30.00 | ms/batch 300.94 | loss 4.28 | ppl 72.27 | epoch 216 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.69 | loss 4.34 | ppl 76.73 | epoch 216 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.75 | loss 4.31 | ppl 74.20 | epoch 216 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.25 | loss 4.26 | ppl 70.67 | epoch 216 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.25 | ppl 70.42 ----------------------------------------------------------------------------------------- | end of epoch 216 | time: 513.69s | valid loss 4.79 | valid ppl 120.36 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 217 | 200/ 1568 batches | lr 30.00 | ms/batch 305.64 | loss 4.31 | ppl 74.59 | epoch 217 | 400/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.30 | ppl 73.42 | epoch 217 | 600/ 1568 batches | lr 30.00 | ms/batch 298.48 | loss 4.27 | ppl 71.85 | epoch 217 | 800/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.29 | ppl 73.03 | epoch 217 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.83 | loss 4.34 | ppl 77.04 | epoch 217 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.66 | loss 4.28 | ppl 72.52 | epoch 217 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.81 | loss 4.27 | ppl 71.64 | epoch 217 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.26 | ppl 70.96 ----------------------------------------------------------------------------------------- | end of epoch 217 | time: 514.27s | valid loss 4.79 | valid ppl 120.35 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 218 | 200/ 1568 batches | lr 30.00 | ms/batch 300.90 | loss 4.32 | ppl 74.83 | epoch 218 | 400/ 1568 batches | lr 30.00 | ms/batch 298.07 | loss 4.31 | ppl 74.43 | epoch 218 | 600/ 1568 batches | lr 30.00 | ms/batch 304.16 | loss 4.27 | ppl 71.36 | epoch 218 | 800/ 1568 batches | lr 30.00 | ms/batch 299.61 | loss 4.27 | ppl 71.29 | epoch 218 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.44 | loss 4.34 | ppl 76.64 | epoch 218 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.29 | ppl 73.29 | epoch 218 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.28 | loss 4.25 | ppl 70.05 | epoch 218 | 1600/ 1568 batches | lr 30.00 | ms/batch 307.47 | loss 4.25 | ppl 70.27 ----------------------------------------------------------------------------------------- | end of epoch 218 | time: 514.22s | valid loss 4.79 | valid ppl 120.34 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 219 | 200/ 1568 batches | lr 30.00 | ms/batch 303.71 | loss 4.31 | ppl 74.62 | epoch 219 | 400/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.31 | ppl 74.19 | epoch 219 | 600/ 1568 batches | lr 30.00 | ms/batch 296.47 | loss 4.26 | ppl 70.90 | epoch 219 | 800/ 1568 batches | lr 30.00 | ms/batch 297.51 | loss 4.28 | ppl 71.90 | epoch 219 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.77 | loss 4.34 | ppl 76.75 | epoch 219 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.83 | loss 4.28 | ppl 72.43 | epoch 219 | 1400/ 1568 batches | lr 30.00 | ms/batch 307.15 | loss 4.27 | ppl 71.79 | epoch 219 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.25 | ppl 70.38 ----------------------------------------------------------------------------------------- | end of epoch 219 | time: 513.33s | valid loss 4.79 | valid ppl 120.34 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 220 | 200/ 1568 batches | lr 30.00 | ms/batch 303.53 | loss 4.32 | ppl 75.30 | epoch 220 | 400/ 1568 batches | lr 30.00 | ms/batch 301.30 | loss 4.30 | ppl 73.85 | epoch 220 | 600/ 1568 batches | lr 30.00 | ms/batch 301.65 | loss 4.25 | ppl 70.33 | epoch 220 | 800/ 1568 batches | lr 30.00 | ms/batch 303.38 | loss 4.29 | ppl 72.73 | epoch 220 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.22 | loss 4.34 | ppl 76.96 | epoch 220 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.14 | loss 4.29 | ppl 73.18 | epoch 220 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.45 | loss 4.27 | ppl 71.59 | epoch 220 | 1600/ 1568 batches | lr 30.00 | ms/batch 294.79 | loss 4.24 | ppl 69.68 ----------------------------------------------------------------------------------------- | end of epoch 220 | time: 514.15s | valid loss 4.79 | valid ppl 120.33 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 221 | 200/ 1568 batches | lr 30.00 | ms/batch 305.68 | loss 4.33 | ppl 75.66 | epoch 221 | 400/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.30 | ppl 73.53 | epoch 221 | 600/ 1568 batches | lr 30.00 | ms/batch 299.76 | loss 4.27 | ppl 71.71 | epoch 221 | 800/ 1568 batches | lr 30.00 | ms/batch 299.72 | loss 4.27 | ppl 71.62 | epoch 221 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.40 | loss 4.33 | ppl 75.67 | epoch 221 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.26 | loss 4.27 | ppl 71.72 | epoch 221 | 1400/ 1568 batches | lr 30.00 | ms/batch 295.27 | loss 4.28 | ppl 72.35 | epoch 221 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.40 | loss 4.25 | ppl 69.79 ----------------------------------------------------------------------------------------- | end of epoch 221 | time: 514.11s | valid loss 4.79 | valid ppl 120.32 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 222 | 200/ 1568 batches | lr 30.00 | ms/batch 303.76 | loss 4.32 | ppl 74.86 | epoch 222 | 400/ 1568 batches | lr 30.00 | ms/batch 298.55 | loss 4.30 | ppl 73.59 | epoch 222 | 600/ 1568 batches | lr 30.00 | ms/batch 298.16 | loss 4.26 | ppl 70.65 | epoch 222 | 800/ 1568 batches | lr 30.00 | ms/batch 302.09 | loss 4.27 | ppl 71.46 | epoch 222 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.98 | loss 4.34 | ppl 76.87 | epoch 222 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.20 | loss 4.27 | ppl 71.56 | epoch 222 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.77 | loss 4.26 | ppl 70.94 | epoch 222 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.67 | loss 4.26 | ppl 70.53 ----------------------------------------------------------------------------------------- | end of epoch 222 | time: 514.34s | valid loss 4.79 | valid ppl 120.32 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 223 | 200/ 1568 batches | lr 30.00 | ms/batch 299.25 | loss 4.31 | ppl 74.15 | epoch 223 | 400/ 1568 batches | lr 30.00 | ms/batch 304.93 | loss 4.29 | ppl 73.08 | epoch 223 | 600/ 1568 batches | lr 30.00 | ms/batch 297.69 | loss 4.27 | ppl 71.21 | epoch 223 | 800/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.28 | ppl 71.95 | epoch 223 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.75 | loss 4.31 | ppl 74.17 | epoch 223 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.89 | loss 4.27 | ppl 71.28 | epoch 223 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.49 | loss 4.26 | ppl 71.14 | epoch 223 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.47 | loss 4.26 | ppl 70.49 ----------------------------------------------------------------------------------------- | end of epoch 223 | time: 513.90s | valid loss 4.79 | valid ppl 120.32 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 224 | 200/ 1568 batches | lr 30.00 | ms/batch 304.03 | loss 4.32 | ppl 74.85 | epoch 224 | 400/ 1568 batches | lr 30.00 | ms/batch 302.69 | loss 4.30 | ppl 73.78 | epoch 224 | 600/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.27 | ppl 71.53 | epoch 224 | 800/ 1568 batches | lr 30.00 | ms/batch 303.64 | loss 4.26 | ppl 70.57 | epoch 224 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.29 | loss 4.32 | ppl 75.24 | epoch 224 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.90 | loss 4.28 | ppl 72.47 | epoch 224 | 1400/ 1568 batches | lr 30.00 | ms/batch 307.07 | loss 4.28 | ppl 72.09 | epoch 224 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.58 | loss 4.23 | ppl 68.95 ----------------------------------------------------------------------------------------- | end of epoch 224 | time: 513.70s | valid loss 4.79 | valid ppl 120.32 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 225 | 200/ 1568 batches | lr 30.00 | ms/batch 302.78 | loss 4.32 | ppl 74.85 | epoch 225 | 400/ 1568 batches | lr 30.00 | ms/batch 300.35 | loss 4.28 | ppl 72.53 | epoch 225 | 600/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.27 | ppl 71.52 | epoch 225 | 800/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.29 | ppl 72.87 | epoch 225 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.23 | loss 4.32 | ppl 75.48 | epoch 225 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.27 | ppl 71.40 | epoch 225 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.26 | ppl 70.89 | epoch 225 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.94 | loss 4.24 | ppl 69.11 ----------------------------------------------------------------------------------------- | end of epoch 225 | time: 515.11s | valid loss 4.79 | valid ppl 120.31 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 226 | 200/ 1568 batches | lr 30.00 | ms/batch 302.10 | loss 4.32 | ppl 75.06 | epoch 226 | 400/ 1568 batches | lr 30.00 | ms/batch 305.14 | loss 4.28 | ppl 71.91 | epoch 226 | 600/ 1568 batches | lr 30.00 | ms/batch 301.65 | loss 4.25 | ppl 70.08 | epoch 226 | 800/ 1568 batches | lr 30.00 | ms/batch 303.80 | loss 4.30 | ppl 73.58 | epoch 226 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.55 | loss 4.32 | ppl 75.48 | epoch 226 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.02 | loss 4.27 | ppl 71.59 | epoch 226 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.92 | loss 4.26 | ppl 71.12 | epoch 226 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.04 | loss 4.25 | ppl 69.80 ----------------------------------------------------------------------------------------- | end of epoch 226 | time: 514.42s | valid loss 4.79 | valid ppl 120.31 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 227 | 200/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.31 | ppl 74.60 | epoch 227 | 400/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.28 | ppl 72.30 | epoch 227 | 600/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.27 | ppl 71.33 | epoch 227 | 800/ 1568 batches | lr 30.00 | ms/batch 302.96 | loss 4.27 | ppl 71.38 | epoch 227 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.13 | loss 4.33 | ppl 75.60 | epoch 227 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.27 | ppl 71.86 | epoch 227 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.28 | ppl 72.36 | epoch 227 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 4.23 | ppl 69.02 ----------------------------------------------------------------------------------------- | end of epoch 227 | time: 514.59s | valid loss 4.79 | valid ppl 120.30 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 228 | 200/ 1568 batches | lr 30.00 | ms/batch 308.40 | loss 4.32 | ppl 74.88 | epoch 228 | 400/ 1568 batches | lr 30.00 | ms/batch 302.47 | loss 4.29 | ppl 73.11 | epoch 228 | 600/ 1568 batches | lr 30.00 | ms/batch 294.91 | loss 4.26 | ppl 70.83 | epoch 228 | 800/ 1568 batches | lr 30.00 | ms/batch 303.12 | loss 4.30 | ppl 73.55 | epoch 228 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.09 | loss 4.34 | ppl 76.44 | epoch 228 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.29 | loss 4.26 | ppl 70.91 | epoch 228 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.01 | loss 4.26 | ppl 70.68 | epoch 228 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.27 | loss 4.24 | ppl 69.12 ----------------------------------------------------------------------------------------- | end of epoch 228 | time: 515.23s | valid loss 4.79 | valid ppl 120.30 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 229 | 200/ 1568 batches | lr 30.00 | ms/batch 305.52 | loss 4.31 | ppl 74.39 | epoch 229 | 400/ 1568 batches | lr 30.00 | ms/batch 300.64 | loss 4.27 | ppl 71.46 | epoch 229 | 600/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.26 | ppl 70.63 | epoch 229 | 800/ 1568 batches | lr 30.00 | ms/batch 299.14 | loss 4.26 | ppl 71.02 | epoch 229 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.77 | loss 4.33 | ppl 76.32 | epoch 229 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.45 | loss 4.25 | ppl 70.44 | epoch 229 | 1400/ 1568 batches | lr 30.00 | ms/batch 306.44 | loss 4.26 | ppl 70.87 | epoch 229 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.73 | loss 4.25 | ppl 69.89 ----------------------------------------------------------------------------------------- | end of epoch 229 | time: 514.71s | valid loss 4.79 | valid ppl 120.30 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 230 | 200/ 1568 batches | lr 30.00 | ms/batch 298.41 | loss 4.31 | ppl 74.76 | epoch 230 | 400/ 1568 batches | lr 30.00 | ms/batch 299.10 | loss 4.27 | ppl 71.70 | epoch 230 | 600/ 1568 batches | lr 30.00 | ms/batch 303.19 | loss 4.27 | ppl 71.36 | epoch 230 | 800/ 1568 batches | lr 30.00 | ms/batch 304.69 | loss 4.27 | ppl 71.61 | epoch 230 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.34 | ppl 76.46 | epoch 230 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.06 | loss 4.27 | ppl 71.34 | epoch 230 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.95 | loss 4.24 | ppl 69.71 | epoch 230 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.67 | loss 4.25 | ppl 70.20 ----------------------------------------------------------------------------------------- | end of epoch 230 | time: 514.62s | valid loss 4.79 | valid ppl 120.29 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 231 | 200/ 1568 batches | lr 30.00 | ms/batch 301.83 | loss 4.32 | ppl 74.85 | epoch 231 | 400/ 1568 batches | lr 30.00 | ms/batch 299.98 | loss 4.27 | ppl 71.67 | epoch 231 | 600/ 1568 batches | lr 30.00 | ms/batch 303.46 | loss 4.24 | ppl 69.53 | epoch 231 | 800/ 1568 batches | lr 30.00 | ms/batch 296.42 | loss 4.27 | ppl 71.18 | epoch 231 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.10 | loss 4.35 | ppl 77.27 | epoch 231 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.69 | loss 4.27 | ppl 71.22 | epoch 231 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.13 | loss 4.25 | ppl 70.43 | epoch 231 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.12 | loss 4.24 | ppl 69.08 ----------------------------------------------------------------------------------------- | end of epoch 231 | time: 515.09s | valid loss 4.79 | valid ppl 120.29 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 232 | 200/ 1568 batches | lr 30.00 | ms/batch 302.70 | loss 4.30 | ppl 73.84 | epoch 232 | 400/ 1568 batches | lr 30.00 | ms/batch 303.86 | loss 4.26 | ppl 70.70 | epoch 232 | 600/ 1568 batches | lr 30.00 | ms/batch 301.06 | loss 4.26 | ppl 70.53 | epoch 232 | 800/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.29 | ppl 73.11 | epoch 232 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.51 | loss 4.31 | ppl 74.20 | epoch 232 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 4.28 | ppl 72.00 | epoch 232 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.26 | ppl 70.49 | epoch 232 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.17 | loss 4.23 | ppl 68.50 ----------------------------------------------------------------------------------------- | end of epoch 232 | time: 514.73s | valid loss 4.79 | valid ppl 120.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 233 | 200/ 1568 batches | lr 30.00 | ms/batch 302.55 | loss 4.30 | ppl 73.49 | epoch 233 | 400/ 1568 batches | lr 30.00 | ms/batch 299.66 | loss 4.27 | ppl 71.76 | epoch 233 | 600/ 1568 batches | lr 30.00 | ms/batch 303.28 | loss 4.25 | ppl 70.27 | epoch 233 | 800/ 1568 batches | lr 30.00 | ms/batch 299.84 | loss 4.26 | ppl 70.67 | epoch 233 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.34 | ppl 76.58 | epoch 233 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.80 | loss 4.25 | ppl 70.27 | epoch 233 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.89 | loss 4.25 | ppl 69.78 | epoch 233 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.65 | loss 4.23 | ppl 68.72 ----------------------------------------------------------------------------------------- | end of epoch 233 | time: 516.35s | valid loss 4.79 | valid ppl 120.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 234 | 200/ 1568 batches | lr 30.00 | ms/batch 302.78 | loss 4.30 | ppl 73.42 | epoch 234 | 400/ 1568 batches | lr 30.00 | ms/batch 298.76 | loss 4.28 | ppl 72.05 | epoch 234 | 600/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.25 | ppl 69.94 | epoch 234 | 800/ 1568 batches | lr 30.00 | ms/batch 300.54 | loss 4.25 | ppl 70.16 | epoch 234 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.79 | loss 4.33 | ppl 76.03 | epoch 234 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.55 | loss 4.27 | ppl 71.57 | epoch 234 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.81 | loss 4.23 | ppl 68.80 | epoch 234 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.14 | loss 4.24 | ppl 69.23 ----------------------------------------------------------------------------------------- | end of epoch 234 | time: 515.27s | valid loss 4.79 | valid ppl 120.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 235 | 200/ 1568 batches | lr 30.00 | ms/batch 301.24 | loss 4.30 | ppl 73.59 | epoch 235 | 400/ 1568 batches | lr 30.00 | ms/batch 297.24 | loss 4.28 | ppl 72.12 | epoch 235 | 600/ 1568 batches | lr 30.00 | ms/batch 304.05 | loss 4.25 | ppl 70.17 | epoch 235 | 800/ 1568 batches | lr 30.00 | ms/batch 298.56 | loss 4.26 | ppl 70.88 | epoch 235 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.41 | loss 4.32 | ppl 75.18 | epoch 235 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.26 | ppl 70.81 | epoch 235 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.74 | loss 4.24 | ppl 69.15 | epoch 235 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.23 | ppl 68.84 ----------------------------------------------------------------------------------------- | end of epoch 235 | time: 515.09s | valid loss 4.79 | valid ppl 120.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 236 | 200/ 1568 batches | lr 30.00 | ms/batch 303.42 | loss 4.30 | ppl 73.76 | epoch 236 | 400/ 1568 batches | lr 30.00 | ms/batch 303.84 | loss 4.29 | ppl 72.71 | epoch 236 | 600/ 1568 batches | lr 30.00 | ms/batch 297.23 | loss 4.24 | ppl 69.41 | epoch 236 | 800/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.27 | ppl 71.32 | epoch 236 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.06 | loss 4.33 | ppl 75.66 | epoch 236 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.27 | ppl 71.38 | epoch 236 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.23 | loss 4.25 | ppl 70.16 | epoch 236 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.65 | loss 4.24 | ppl 69.47 ----------------------------------------------------------------------------------------- | end of epoch 236 | time: 514.82s | valid loss 4.79 | valid ppl 120.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 237 | 200/ 1568 batches | lr 30.00 | ms/batch 301.18 | loss 4.29 | ppl 73.30 | epoch 237 | 400/ 1568 batches | lr 30.00 | ms/batch 302.27 | loss 4.28 | ppl 72.28 | epoch 237 | 600/ 1568 batches | lr 30.00 | ms/batch 299.75 | loss 4.24 | ppl 69.32 | epoch 237 | 800/ 1568 batches | lr 30.00 | ms/batch 302.65 | loss 4.28 | ppl 71.88 | epoch 237 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.11 | loss 4.31 | ppl 74.54 | epoch 237 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.38 | loss 4.27 | ppl 71.27 | epoch 237 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.34 | loss 4.26 | ppl 71.06 | epoch 237 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.83 | loss 4.23 | ppl 68.77 ----------------------------------------------------------------------------------------- | end of epoch 237 | time: 515.10s | valid loss 4.79 | valid ppl 120.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 238 | 200/ 1568 batches | lr 30.00 | ms/batch 305.98 | loss 4.31 | ppl 74.26 | epoch 238 | 400/ 1568 batches | lr 30.00 | ms/batch 303.99 | loss 4.27 | ppl 71.68 | epoch 238 | 600/ 1568 batches | lr 30.00 | ms/batch 303.30 | loss 4.23 | ppl 68.83 | epoch 238 | 800/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.29 | ppl 73.00 | epoch 238 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.18 | loss 4.31 | ppl 74.29 | epoch 238 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.63 | loss 4.26 | ppl 70.55 | epoch 238 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.25 | ppl 69.90 | epoch 238 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.51 | loss 4.23 | ppl 68.57 ----------------------------------------------------------------------------------------- | end of epoch 238 | time: 515.29s | valid loss 4.79 | valid ppl 120.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 239 | 200/ 1568 batches | lr 30.00 | ms/batch 304.91 | loss 4.31 | ppl 74.52 | epoch 239 | 400/ 1568 batches | lr 30.00 | ms/batch 296.50 | loss 4.27 | ppl 71.71 | epoch 239 | 600/ 1568 batches | lr 30.00 | ms/batch 298.52 | loss 4.25 | ppl 70.14 | epoch 239 | 800/ 1568 batches | lr 30.00 | ms/batch 303.20 | loss 4.26 | ppl 71.06 | epoch 239 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.89 | loss 4.32 | ppl 75.52 | epoch 239 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.25 | ppl 69.90 | epoch 239 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.25 | ppl 70.29 | epoch 239 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.80 | loss 4.24 | ppl 69.22 ----------------------------------------------------------------------------------------- | end of epoch 239 | time: 514.93s | valid loss 4.79 | valid ppl 120.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 240 | 200/ 1568 batches | lr 30.00 | ms/batch 308.34 | loss 4.30 | ppl 73.58 | epoch 240 | 400/ 1568 batches | lr 30.00 | ms/batch 301.65 | loss 4.28 | ppl 72.44 | epoch 240 | 600/ 1568 batches | lr 30.00 | ms/batch 299.44 | loss 4.25 | ppl 70.31 | epoch 240 | 800/ 1568 batches | lr 30.00 | ms/batch 299.20 | loss 4.27 | ppl 71.33 | epoch 240 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.20 | loss 4.31 | ppl 74.14 | epoch 240 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.80 | loss 4.25 | ppl 70.19 | epoch 240 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.89 | loss 4.24 | ppl 69.64 | epoch 240 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.23 | ppl 68.53 ----------------------------------------------------------------------------------------- | end of epoch 240 | time: 515.20s | valid loss 4.79 | valid ppl 120.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 241 | 200/ 1568 batches | lr 30.00 | ms/batch 305.87 | loss 4.28 | ppl 72.42 | epoch 241 | 400/ 1568 batches | lr 30.00 | ms/batch 300.03 | loss 4.28 | ppl 71.93 | epoch 241 | 600/ 1568 batches | lr 30.00 | ms/batch 297.74 | loss 4.25 | ppl 69.76 | epoch 241 | 800/ 1568 batches | lr 30.00 | ms/batch 304.32 | loss 4.26 | ppl 70.53 | epoch 241 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.38 | loss 4.31 | ppl 74.77 | epoch 241 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.25 | ppl 70.16 | epoch 241 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.22 | loss 4.25 | ppl 70.39 | epoch 241 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.24 | ppl 69.07 ----------------------------------------------------------------------------------------- | end of epoch 241 | time: 515.06s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 242 | 200/ 1568 batches | lr 30.00 | ms/batch 304.06 | loss 4.30 | ppl 73.46 | epoch 242 | 400/ 1568 batches | lr 30.00 | ms/batch 300.28 | loss 4.28 | ppl 72.36 | epoch 242 | 600/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.23 | ppl 68.73 | epoch 242 | 800/ 1568 batches | lr 30.00 | ms/batch 307.24 | loss 4.25 | ppl 70.32 | epoch 242 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.51 | loss 4.32 | ppl 75.14 | epoch 242 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.63 | loss 4.25 | ppl 70.35 | epoch 242 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.17 | loss 4.26 | ppl 70.96 | epoch 242 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.42 | loss 4.22 | ppl 68.04 ----------------------------------------------------------------------------------------- | end of epoch 242 | time: 514.61s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- | epoch 243 | 200/ 1568 batches | lr 30.00 | ms/batch 303.20 | loss 4.30 | ppl 73.70 | epoch 243 | 400/ 1568 batches | lr 30.00 | ms/batch 294.05 | loss 4.29 | ppl 73.07 | epoch 243 | 600/ 1568 batches | lr 30.00 | ms/batch 297.86 | loss 4.25 | ppl 70.07 | epoch 243 | 800/ 1568 batches | lr 30.00 | ms/batch 300.85 | loss 4.25 | ppl 70.21 | epoch 243 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.53 | loss 4.31 | ppl 74.67 | epoch 243 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.75 | loss 4.25 | ppl 70.44 | epoch 243 | 1400/ 1568 batches | lr 30.00 | ms/batch 296.99 | loss 4.24 | ppl 69.22 | epoch 243 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.25 | ppl 70.08 ----------------------------------------------------------------------------------------- | end of epoch 243 | time: 515.45s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 244 | 200/ 1568 batches | lr 30.00 | ms/batch 303.27 | loss 4.29 | ppl 73.00 | epoch 244 | 400/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.28 | ppl 72.26 | epoch 244 | 600/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.24 | ppl 69.71 | epoch 244 | 800/ 1568 batches | lr 30.00 | ms/batch 301.96 | loss 4.26 | ppl 70.46 | epoch 244 | 1000/ 1568 batches | lr 30.00 | ms/batch 295.09 | loss 4.32 | ppl 74.97 | epoch 244 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.99 | loss 4.25 | ppl 70.41 | epoch 244 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 4.24 | ppl 69.54 | epoch 244 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.35 | loss 4.22 | ppl 68.34 ----------------------------------------------------------------------------------------- | end of epoch 244 | time: 515.29s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 245 | 200/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.29 | ppl 72.93 | epoch 245 | 400/ 1568 batches | lr 30.00 | ms/batch 298.84 | loss 4.27 | ppl 71.78 | epoch 245 | 600/ 1568 batches | lr 30.00 | ms/batch 302.74 | loss 4.24 | ppl 69.42 | epoch 245 | 800/ 1568 batches | lr 30.00 | ms/batch 299.03 | loss 4.25 | ppl 70.26 | epoch 245 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.71 | loss 4.32 | ppl 75.14 | epoch 245 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.82 | loss 4.26 | ppl 70.55 | epoch 245 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.37 | loss 4.24 | ppl 69.52 | epoch 245 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 4.22 | ppl 67.95 ----------------------------------------------------------------------------------------- | end of epoch 245 | time: 515.41s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 246 | 200/ 1568 batches | lr 30.00 | ms/batch 301.49 | loss 4.29 | ppl 72.82 | epoch 246 | 400/ 1568 batches | lr 30.00 | ms/batch 300.24 | loss 4.29 | ppl 72.86 | epoch 246 | 600/ 1568 batches | lr 30.00 | ms/batch 306.68 | loss 4.25 | ppl 70.24 | epoch 246 | 800/ 1568 batches | lr 30.00 | ms/batch 300.02 | loss 4.25 | ppl 70.20 | epoch 246 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.41 | loss 4.32 | ppl 74.97 | epoch 246 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.32 | loss 4.25 | ppl 70.45 | epoch 246 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.50 | loss 4.24 | ppl 69.15 | epoch 246 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.61 | loss 4.23 | ppl 68.90 ----------------------------------------------------------------------------------------- | end of epoch 246 | time: 517.49s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 247 | 200/ 1568 batches | lr 30.00 | ms/batch 301.77 | loss 4.28 | ppl 72.09 | epoch 247 | 400/ 1568 batches | lr 30.00 | ms/batch 300.49 | loss 4.28 | ppl 72.27 | epoch 247 | 600/ 1568 batches | lr 30.00 | ms/batch 301.07 | loss 4.26 | ppl 70.75 | epoch 247 | 800/ 1568 batches | lr 30.00 | ms/batch 304.40 | loss 4.24 | ppl 69.64 | epoch 247 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.38 | loss 4.32 | ppl 75.04 | epoch 247 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.52 | loss 4.24 | ppl 69.60 | epoch 247 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.29 | loss 4.24 | ppl 69.41 | epoch 247 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 4.21 | ppl 67.25 ----------------------------------------------------------------------------------------- | end of epoch 247 | time: 514.85s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 248 | 200/ 1568 batches | lr 30.00 | ms/batch 304.23 | loss 4.29 | ppl 72.71 | epoch 248 | 400/ 1568 batches | lr 30.00 | ms/batch 299.51 | loss 4.26 | ppl 71.16 | epoch 248 | 600/ 1568 batches | lr 30.00 | ms/batch 303.87 | loss 4.24 | ppl 69.70 | epoch 248 | 800/ 1568 batches | lr 30.00 | ms/batch 300.77 | loss 4.24 | ppl 69.71 | epoch 248 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.90 | loss 4.32 | ppl 74.93 | epoch 248 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.19 | loss 4.25 | ppl 69.86 | epoch 248 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.66 | loss 4.23 | ppl 68.82 | epoch 248 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.70 | loss 4.22 | ppl 68.32 ----------------------------------------------------------------------------------------- | end of epoch 248 | time: 515.17s | valid loss 4.79 | valid ppl 120.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 249 | 200/ 1568 batches | lr 30.00 | ms/batch 298.17 | loss 4.29 | ppl 73.00 | epoch 249 | 400/ 1568 batches | lr 30.00 | ms/batch 302.10 | loss 4.26 | ppl 71.13 | epoch 249 | 600/ 1568 batches | lr 30.00 | ms/batch 305.74 | loss 4.23 | ppl 69.03 | epoch 249 | 800/ 1568 batches | lr 30.00 | ms/batch 304.32 | loss 4.26 | ppl 70.96 | epoch 249 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.21 | loss 4.31 | ppl 74.11 | epoch 249 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.50 | loss 4.25 | ppl 70.02 | epoch 249 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.08 | loss 4.25 | ppl 69.84 | epoch 249 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.48 | loss 4.20 | ppl 66.96 ----------------------------------------------------------------------------------------- | end of epoch 249 | time: 515.27s | valid loss 4.79 | valid ppl 120.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 250 | 200/ 1568 batches | lr 30.00 | ms/batch 301.54 | loss 4.29 | ppl 73.08 | epoch 250 | 400/ 1568 batches | lr 30.00 | ms/batch 301.57 | loss 4.27 | ppl 71.58 | epoch 250 | 600/ 1568 batches | lr 30.00 | ms/batch 303.31 | loss 4.23 | ppl 69.04 | epoch 250 | 800/ 1568 batches | lr 30.00 | ms/batch 302.93 | loss 4.24 | ppl 69.33 | epoch 250 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.07 | loss 4.31 | ppl 74.41 | epoch 250 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.24 | ppl 69.49 | epoch 250 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.05 | loss 4.24 | ppl 69.35 | epoch 250 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.03 | loss 4.22 | ppl 68.15 ----------------------------------------------------------------------------------------- | end of epoch 250 | time: 515.13s | valid loss 4.79 | valid ppl 120.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 251 | 200/ 1568 batches | lr 30.00 | ms/batch 309.23 | loss 4.30 | ppl 73.84 | epoch 251 | 400/ 1568 batches | lr 30.00 | ms/batch 299.16 | loss 4.26 | ppl 70.68 | epoch 251 | 600/ 1568 batches | lr 30.00 | ms/batch 300.79 | loss 4.24 | ppl 69.13 | epoch 251 | 800/ 1568 batches | lr 30.00 | ms/batch 299.65 | loss 4.25 | ppl 70.13 | epoch 251 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.67 | loss 4.31 | ppl 74.50 | epoch 251 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.23 | loss 4.25 | ppl 70.23 | epoch 251 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.57 | loss 4.24 | ppl 69.73 | epoch 251 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.82 | loss 4.21 | ppl 67.47 ----------------------------------------------------------------------------------------- | end of epoch 251 | time: 514.95s | valid loss 4.79 | valid ppl 120.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 252 | 200/ 1568 batches | lr 30.00 | ms/batch 304.99 | loss 4.28 | ppl 72.32 | epoch 252 | 400/ 1568 batches | lr 30.00 | ms/batch 300.19 | loss 4.26 | ppl 70.53 | epoch 252 | 600/ 1568 batches | lr 30.00 | ms/batch 300.69 | loss 4.22 | ppl 68.02 | epoch 252 | 800/ 1568 batches | lr 30.00 | ms/batch 297.62 | loss 4.25 | ppl 70.12 | epoch 252 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.59 | loss 4.31 | ppl 74.46 | epoch 252 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.13 | loss 4.25 | ppl 69.87 | epoch 252 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.80 | loss 4.23 | ppl 68.40 | epoch 252 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.94 | loss 4.21 | ppl 67.52 ----------------------------------------------------------------------------------------- | end of epoch 252 | time: 515.44s | valid loss 4.79 | valid ppl 120.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 253 | 200/ 1568 batches | lr 30.00 | ms/batch 307.76 | loss 4.29 | ppl 73.12 ^C----------------------------------------------------------------------------------------- Exiting from training early ========================================================================================= | End of training | test loss 3.99 | test ppl 54.11 ========================================================================================= (awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python -u finetune.py --epochs 500 --data data/March-2018_16mb --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1500 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 20 --emsize=400 --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight Applying weight drop of 0.1 to weight [QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=800, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ), QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=1200) ) )] Args: Namespace(alpha=2, batch_size=20, beta=1, bptt=70, clip=0.25, cuda=True, data='data/March-2018_16mb', dropout=0.4, dropoute=0.1, dropouth=0.2, dropouti=0.4, emsize=400, epochs=500, log_interval=200, lr=30, model='QRNN', nhid=1500, nlayers=4, nonmono=5, save='models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt', seed=4002, tied=True, wdecay=1.2e-06, wdrop=0.1) Model total parameters: 83198208 | epoch 1 | 200/ 1568 batches | lr 30.00 | ms/batch 297.68 | loss 4.27 | ppl 71.24 | epoch 1 | 400/ 1568 batches | lr 30.00 | ms/batch 308.80 | loss 4.26 | ppl 70.76 | epoch 1 | 600/ 1568 batches | lr 30.00 | ms/batch 299.20 | loss 4.28 | ppl 72.29 | epoch 1 | 800/ 1568 batches | lr 30.00 | ms/batch 305.42 | loss 4.29 | ppl 73.31 | epoch 1 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.38 | ppl 80.07 | epoch 1 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.09 | loss 4.37 | ppl 79.34 | epoch 1 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.56 | loss 4.38 | ppl 79.51 | epoch 1 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.59 | loss 4.38 | ppl 79.77 ----------------------------------------------------------------------------------------- | end of epoch 1 | time: 515.99s | valid loss 4.73 | valid ppl 113.63 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 2 | 200/ 1568 batches | lr 30.00 | ms/batch 304.30 | loss 4.37 | ppl 79.08 | epoch 2 | 400/ 1568 batches | lr 30.00 | ms/batch 301.79 | loss 4.37 | ppl 78.98 | epoch 2 | 600/ 1568 batches | lr 30.00 | ms/batch 305.93 | loss 4.35 | ppl 77.82 | epoch 2 | 800/ 1568 batches | lr 30.00 | ms/batch 303.57 | loss 4.40 | ppl 81.39 | epoch 2 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.47 | ppl 87.33 | epoch 2 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.07 | loss 4.45 | ppl 85.82 | epoch 2 | 1400/ 1568 batches | lr 30.00 | ms/batch 296.64 | loss 4.47 | ppl 87.37 | epoch 2 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.50 | loss 4.45 | ppl 85.99 ----------------------------------------------------------------------------------------- | end of epoch 2 | time: 515.40s | valid loss 4.71 | valid ppl 110.82 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 3 | 200/ 1568 batches | lr 30.00 | ms/batch 305.91 | loss 4.41 | ppl 81.91 | epoch 3 | 400/ 1568 batches | lr 30.00 | ms/batch 297.18 | loss 4.40 | ppl 81.67 | epoch 3 | 600/ 1568 batches | lr 30.00 | ms/batch 297.66 | loss 4.39 | ppl 80.77 | epoch 3 | 800/ 1568 batches | lr 30.00 | ms/batch 305.69 | loss 4.43 | ppl 83.91 | epoch 3 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.71 | loss 4.49 | ppl 89.11 | epoch 3 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.92 | loss 4.46 | ppl 86.14 | epoch 3 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.56 | loss 4.45 | ppl 86.03 | epoch 3 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.29 | loss 4.47 | ppl 87.09 ----------------------------------------------------------------------------------------- | end of epoch 3 | time: 515.48s | valid loss 4.69 | valid ppl 109.10 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 4 | 200/ 1568 batches | lr 30.00 | ms/batch 303.80 | loss 4.42 | ppl 82.94 | epoch 4 | 400/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.41 | ppl 81.98 | epoch 4 | 600/ 1568 batches | lr 30.00 | ms/batch 304.16 | loss 4.40 | ppl 81.33 | epoch 4 | 800/ 1568 batches | lr 30.00 | ms/batch 306.26 | loss 4.42 | ppl 82.98 | epoch 4 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.45 | loss 4.48 | ppl 87.95 | epoch 4 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.67 | loss 4.44 | ppl 84.77 | epoch 4 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.45 | ppl 85.79 | epoch 4 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.65 | loss 4.46 | ppl 86.17 ----------------------------------------------------------------------------------------- | end of epoch 4 | time: 515.47s | valid loss 4.68 | valid ppl 107.83 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 5 | 200/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.41 | ppl 82.50 | epoch 5 | 400/ 1568 batches | lr 30.00 | ms/batch 299.63 | loss 4.41 | ppl 82.49 | epoch 5 | 600/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.42 | ppl 82.71 | epoch 5 | 800/ 1568 batches | lr 30.00 | ms/batch 305.34 | loss 4.41 | ppl 82.49 | epoch 5 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.60 | loss 4.48 | ppl 88.47 | epoch 5 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.58 | loss 4.45 | ppl 85.79 | epoch 5 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.79 | loss 4.46 | ppl 86.45 | epoch 5 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.13 | loss 4.44 | ppl 85.13 ----------------------------------------------------------------------------------------- | end of epoch 5 | time: 515.20s | valid loss 4.67 | valid ppl 106.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 6 | 200/ 1568 batches | lr 30.00 | ms/batch 304.51 | loss 4.42 | ppl 83.19 | epoch 6 | 400/ 1568 batches | lr 30.00 | ms/batch 305.58 | loss 4.41 | ppl 82.63 | epoch 6 | 600/ 1568 batches | lr 30.00 | ms/batch 303.57 | loss 4.40 | ppl 81.07 | epoch 6 | 800/ 1568 batches | lr 30.00 | ms/batch 304.19 | loss 4.42 | ppl 82.79 | epoch 6 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.80 | loss 4.47 | ppl 87.54 | epoch 6 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.50 | loss 4.45 | ppl 85.39 | epoch 6 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.01 | loss 4.44 | ppl 84.60 | epoch 6 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.62 | loss 4.42 | ppl 83.03 ----------------------------------------------------------------------------------------- | end of epoch 6 | time: 515.15s | valid loss 4.67 | valid ppl 106.21 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 7 | 200/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.42 | ppl 83.11 | epoch 7 | 400/ 1568 batches | lr 30.00 | ms/batch 301.38 | loss 4.43 | ppl 83.68 | epoch 7 | 600/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.40 | ppl 81.52 | epoch 7 | 800/ 1568 batches | lr 30.00 | ms/batch 302.99 | loss 4.41 | ppl 82.29 | epoch 7 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.58 | loss 4.47 | ppl 87.05 | epoch 7 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.74 | loss 4.43 | ppl 83.63 | epoch 7 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.13 | loss 4.42 | ppl 83.02 | epoch 7 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.18 | loss 4.41 | ppl 82.40 ----------------------------------------------------------------------------------------- | end of epoch 7 | time: 515.71s | valid loss 4.66 | valid ppl 105.62 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 8 | 200/ 1568 batches | lr 30.00 | ms/batch 308.67 | loss 4.41 | ppl 82.64 | epoch 8 | 400/ 1568 batches | lr 30.00 | ms/batch 300.87 | loss 4.40 | ppl 81.33 | epoch 8 | 600/ 1568 batches | lr 30.00 | ms/batch 303.36 | loss 4.39 | ppl 80.45 | epoch 8 | 800/ 1568 batches | lr 30.00 | ms/batch 303.26 | loss 4.40 | ppl 81.44 | epoch 8 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.33 | loss 4.48 | ppl 87.86 | epoch 8 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.44 | loss 4.42 | ppl 83.46 | epoch 8 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.30 | loss 4.43 | ppl 83.70 | epoch 8 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.41 | ppl 82.11 ----------------------------------------------------------------------------------------- | end of epoch 8 | time: 517.11s | valid loss 4.65 | valid ppl 105.10 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 9 | 200/ 1568 batches | lr 30.00 | ms/batch 304.42 | loss 4.41 | ppl 82.04 | epoch 9 | 400/ 1568 batches | lr 30.00 | ms/batch 300.21 | loss 4.40 | ppl 81.70 | epoch 9 | 600/ 1568 batches | lr 30.00 | ms/batch 302.95 | loss 4.40 | ppl 81.41 | epoch 9 | 800/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.40 | ppl 81.47 | epoch 9 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.43 | loss 4.47 | ppl 87.27 | epoch 9 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.95 | loss 4.41 | ppl 82.13 | epoch 9 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.49 | loss 4.40 | ppl 81.47 | epoch 9 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.19 | loss 4.40 | ppl 81.43 ----------------------------------------------------------------------------------------- | end of epoch 9 | time: 515.43s | valid loss 4.65 | valid ppl 104.67 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 10 | 200/ 1568 batches | lr 30.00 | ms/batch 304.05 | loss 4.39 | ppl 80.99 | epoch 10 | 400/ 1568 batches | lr 30.00 | ms/batch 306.28 | loss 4.40 | ppl 81.76 | epoch 10 | 600/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.37 | ppl 79.28 | epoch 10 | 800/ 1568 batches | lr 30.00 | ms/batch 305.25 | loss 4.41 | ppl 82.49 | epoch 10 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.56 | loss 4.44 | ppl 84.62 | epoch 10 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.50 | loss 4.42 | ppl 82.79 | epoch 10 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.67 | loss 4.41 | ppl 82.46 | epoch 10 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.51 | loss 4.38 | ppl 79.75 ----------------------------------------------------------------------------------------- | end of epoch 10 | time: 515.30s | valid loss 4.65 | valid ppl 104.28 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 11 | 200/ 1568 batches | lr 30.00 | ms/batch 306.92 | loss 4.41 | ppl 82.68 | epoch 11 | 400/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.39 | ppl 81.04 | epoch 11 | 600/ 1568 batches | lr 30.00 | ms/batch 306.39 | loss 4.38 | ppl 79.95 | epoch 11 | 800/ 1568 batches | lr 30.00 | ms/batch 306.25 | loss 4.40 | ppl 81.58 | epoch 11 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.95 | loss 4.45 | ppl 85.74 | epoch 11 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.41 | ppl 82.08 | epoch 11 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.55 | loss 4.42 | ppl 82.81 | epoch 11 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.10 | loss 4.36 | ppl 78.53 ----------------------------------------------------------------------------------------- | end of epoch 11 | time: 516.33s | valid loss 4.64 | valid ppl 103.94 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 12 | 200/ 1568 batches | lr 30.00 | ms/batch 306.16 | loss 4.41 | ppl 82.68 | epoch 12 | 400/ 1568 batches | lr 30.00 | ms/batch 298.34 | loss 4.39 | ppl 80.44 | epoch 12 | 600/ 1568 batches | lr 30.00 | ms/batch 300.09 | loss 4.37 | ppl 79.05 | epoch 12 | 800/ 1568 batches | lr 30.00 | ms/batch 303.69 | loss 4.38 | ppl 79.57 | epoch 12 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.97 | loss 4.46 | ppl 86.06 | epoch 12 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.39 | ppl 80.81 | epoch 12 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.40 | loss 4.41 | ppl 81.91 | epoch 12 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.92 | loss 4.38 | ppl 80.07 ----------------------------------------------------------------------------------------- | end of epoch 12 | time: 515.44s | valid loss 4.64 | valid ppl 103.63 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 13 | 200/ 1568 batches | lr 30.00 | ms/batch 305.50 | loss 4.40 | ppl 81.49 | epoch 13 | 400/ 1568 batches | lr 30.00 | ms/batch 298.23 | loss 4.38 | ppl 79.91 | epoch 13 | 600/ 1568 batches | lr 30.00 | ms/batch 305.15 | loss 4.38 | ppl 79.68 | epoch 13 | 800/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.39 | ppl 80.51 | epoch 13 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.89 | loss 4.43 | ppl 84.26 | epoch 13 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.96 | loss 4.39 | ppl 80.95 | epoch 13 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.58 | loss 4.40 | ppl 81.50 | epoch 13 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.56 | loss 4.36 | ppl 78.57 ----------------------------------------------------------------------------------------- | end of epoch 13 | time: 515.42s | valid loss 4.64 | valid ppl 103.35 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 14 | 200/ 1568 batches | lr 30.00 | ms/batch 308.48 | loss 4.39 | ppl 80.89 | epoch 14 | 400/ 1568 batches | lr 30.00 | ms/batch 301.35 | loss 4.37 | ppl 78.93 | epoch 14 | 600/ 1568 batches | lr 30.00 | ms/batch 303.76 | loss 4.36 | ppl 78.30 | epoch 14 | 800/ 1568 batches | lr 30.00 | ms/batch 304.24 | loss 4.38 | ppl 79.90 | epoch 14 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.09 | loss 4.42 | ppl 83.28 | epoch 14 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.37 | ppl 79.21 | epoch 14 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.66 | loss 4.38 | ppl 79.70 | epoch 14 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.36 | loss 4.35 | ppl 77.67 ----------------------------------------------------------------------------------------- | end of epoch 14 | time: 516.16s | valid loss 4.64 | valid ppl 103.12 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 15 | 200/ 1568 batches | lr 30.00 | ms/batch 303.12 | loss 4.39 | ppl 80.48 | epoch 15 | 400/ 1568 batches | lr 30.00 | ms/batch 297.78 | loss 4.39 | ppl 80.84 | epoch 15 | 600/ 1568 batches | lr 30.00 | ms/batch 297.99 | loss 4.35 | ppl 77.45 | epoch 15 | 800/ 1568 batches | lr 30.00 | ms/batch 303.42 | loss 4.36 | ppl 78.06 | epoch 15 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.88 | loss 4.43 | ppl 83.71 | epoch 15 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.34 | loss 4.38 | ppl 79.92 | epoch 15 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.33 | loss 4.38 | ppl 79.46 | epoch 15 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.88 | loss 4.36 | ppl 78.41 ----------------------------------------------------------------------------------------- | end of epoch 15 | time: 515.48s | valid loss 4.63 | valid ppl 102.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 16 | 200/ 1568 batches | lr 30.00 | ms/batch 302.42 | loss 4.38 | ppl 80.18 | epoch 16 | 400/ 1568 batches | lr 30.00 | ms/batch 305.89 | loss 4.37 | ppl 79.32 | epoch 16 | 600/ 1568 batches | lr 30.00 | ms/batch 299.35 | loss 4.35 | ppl 77.29 | epoch 16 | 800/ 1568 batches | lr 30.00 | ms/batch 302.66 | loss 4.37 | ppl 78.95 | epoch 16 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.42 | ppl 83.03 | epoch 16 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.01 | loss 4.36 | ppl 78.56 | epoch 16 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.08 | loss 4.37 | ppl 78.69 | epoch 16 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.70 | loss 4.35 | ppl 77.71 ----------------------------------------------------------------------------------------- | end of epoch 16 | time: 515.36s | valid loss 4.63 | valid ppl 102.71 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 17 | 200/ 1568 batches | lr 30.00 | ms/batch 301.75 | loss 4.39 | ppl 80.58 | epoch 17 | 400/ 1568 batches | lr 30.00 | ms/batch 301.65 | loss 4.38 | ppl 79.51 | epoch 17 | 600/ 1568 batches | lr 30.00 | ms/batch 302.82 | loss 4.35 | ppl 77.33 | epoch 17 | 800/ 1568 batches | lr 30.00 | ms/batch 303.73 | loss 4.37 | ppl 78.90 | epoch 17 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.35 | loss 4.42 | ppl 83.37 | epoch 17 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.50 | loss 4.36 | ppl 78.09 | epoch 17 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.36 | loss 4.35 | ppl 77.60 | epoch 17 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.77 | loss 4.34 | ppl 77.00 ----------------------------------------------------------------------------------------- | end of epoch 17 | time: 515.33s | valid loss 4.63 | valid ppl 102.54 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 18 | 200/ 1568 batches | lr 30.00 | ms/batch 299.06 | loss 4.39 | ppl 80.63 | epoch 18 | 400/ 1568 batches | lr 30.00 | ms/batch 307.58 | loss 4.37 | ppl 78.95 | epoch 18 | 600/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.33 | ppl 76.22 | epoch 18 | 800/ 1568 batches | lr 30.00 | ms/batch 298.76 | loss 4.37 | ppl 79.19 | epoch 18 | 1000/ 1568 batches | lr 30.00 | ms/batch 307.14 | loss 4.42 | ppl 83.35 | epoch 18 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.36 | loss 4.35 | ppl 77.69 | epoch 18 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.84 | loss 4.36 | ppl 78.07 | epoch 18 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.72 | loss 4.34 | ppl 76.65 ----------------------------------------------------------------------------------------- | end of epoch 18 | time: 516.19s | valid loss 4.63 | valid ppl 102.38 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 19 | 200/ 1568 batches | lr 30.00 | ms/batch 303.97 | loss 4.39 | ppl 80.35 | epoch 19 | 400/ 1568 batches | lr 30.00 | ms/batch 303.17 | loss 4.36 | ppl 78.26 | epoch 19 | 600/ 1568 batches | lr 30.00 | ms/batch 299.80 | loss 4.34 | ppl 76.85 | epoch 19 | 800/ 1568 batches | lr 30.00 | ms/batch 299.60 | loss 4.36 | ppl 77.98 | epoch 19 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.41 | ppl 82.35 | epoch 19 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.64 | loss 4.36 | ppl 77.97 | epoch 19 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.41 | loss 4.35 | ppl 77.66 | epoch 19 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.35 | ppl 77.16 ----------------------------------------------------------------------------------------- | end of epoch 19 | time: 515.30s | valid loss 4.63 | valid ppl 102.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 20 | 200/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.37 | ppl 79.39 | epoch 20 | 400/ 1568 batches | lr 30.00 | ms/batch 296.92 | loss 4.36 | ppl 78.23 | epoch 20 | 600/ 1568 batches | lr 30.00 | ms/batch 303.05 | loss 4.33 | ppl 76.10 | epoch 20 | 800/ 1568 batches | lr 30.00 | ms/batch 300.41 | loss 4.35 | ppl 77.23 | epoch 20 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.26 | loss 4.40 | ppl 81.60 | epoch 20 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.85 | loss 4.35 | ppl 77.45 | epoch 20 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.89 | loss 4.35 | ppl 77.27 | epoch 20 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.05 | loss 4.33 | ppl 75.97 ----------------------------------------------------------------------------------------- | end of epoch 20 | time: 515.79s | valid loss 4.63 | valid ppl 102.12 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 21 | 200/ 1568 batches | lr 30.00 | ms/batch 303.20 | loss 4.37 | ppl 78.92 | epoch 21 | 400/ 1568 batches | lr 30.00 | ms/batch 301.77 | loss 4.37 | ppl 78.77 | epoch 21 | 600/ 1568 batches | lr 30.00 | ms/batch 300.64 | loss 4.33 | ppl 75.66 | epoch 21 | 800/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.36 | ppl 78.00 | epoch 21 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.33 | loss 4.40 | ppl 81.32 | epoch 21 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.89 | loss 4.35 | ppl 77.69 | epoch 21 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.33 | ppl 75.60 | epoch 21 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.83 | loss 4.35 | ppl 77.63 ----------------------------------------------------------------------------------------- | end of epoch 21 | time: 515.75s | valid loss 4.62 | valid ppl 101.99 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 22 | 200/ 1568 batches | lr 30.00 | ms/batch 303.86 | loss 4.36 | ppl 78.44 | epoch 22 | 400/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.35 | ppl 77.33 | epoch 22 | 600/ 1568 batches | lr 30.00 | ms/batch 301.73 | loss 4.33 | ppl 75.65 | epoch 22 | 800/ 1568 batches | lr 30.00 | ms/batch 302.34 | loss 4.34 | ppl 76.61 | epoch 22 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.73 | loss 4.40 | ppl 81.53 | epoch 22 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.42 | loss 4.36 | ppl 78.26 | epoch 22 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.34 | ppl 76.33 | epoch 22 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.34 | loss 4.33 | ppl 76.01 ----------------------------------------------------------------------------------------- | end of epoch 22 | time: 515.65s | valid loss 4.62 | valid ppl 101.86 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 23 | 200/ 1568 batches | lr 30.00 | ms/batch 303.78 | loss 4.36 | ppl 78.64 | epoch 23 | 400/ 1568 batches | lr 30.00 | ms/batch 301.71 | loss 4.36 | ppl 78.05 | epoch 23 | 600/ 1568 batches | lr 30.00 | ms/batch 303.36 | loss 4.35 | ppl 77.12 | epoch 23 | 800/ 1568 batches | lr 30.00 | ms/batch 296.92 | loss 4.33 | ppl 75.86 | epoch 23 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.87 | loss 4.40 | ppl 81.86 | epoch 23 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.59 | loss 4.35 | ppl 77.12 | epoch 23 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.55 | loss 4.33 | ppl 75.69 | epoch 23 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.17 | loss 4.32 | ppl 74.97 ----------------------------------------------------------------------------------------- | end of epoch 23 | time: 515.69s | valid loss 4.62 | valid ppl 101.74 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 24 | 200/ 1568 batches | lr 30.00 | ms/batch 303.63 | loss 4.36 | ppl 78.05 | epoch 24 | 400/ 1568 batches | lr 30.00 | ms/batch 297.84 | loss 4.34 | ppl 76.81 | epoch 24 | 600/ 1568 batches | lr 30.00 | ms/batch 294.64 | loss 4.33 | ppl 75.84 | epoch 24 | 800/ 1568 batches | lr 30.00 | ms/batch 304.08 | loss 4.33 | ppl 76.11 | epoch 24 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.51 | loss 4.38 | ppl 80.14 | epoch 24 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.86 | loss 4.34 | ppl 76.46 | epoch 24 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.34 | loss 4.33 | ppl 76.06 | epoch 24 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.01 | loss 4.31 | ppl 74.55 ----------------------------------------------------------------------------------------- | end of epoch 24 | time: 515.52s | valid loss 4.62 | valid ppl 101.63 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 25 | 200/ 1568 batches | lr 30.00 | ms/batch 306.32 | loss 4.35 | ppl 77.82 | epoch 25 | 400/ 1568 batches | lr 30.00 | ms/batch 297.60 | loss 4.36 | ppl 78.02 | epoch 25 | 600/ 1568 batches | lr 30.00 | ms/batch 301.94 | loss 4.34 | ppl 76.35 | epoch 25 | 800/ 1568 batches | lr 30.00 | ms/batch 305.83 | loss 4.34 | ppl 76.39 | epoch 25 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.75 | loss 4.38 | ppl 79.69 | epoch 25 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.63 | loss 4.34 | ppl 76.79 | epoch 25 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.96 | loss 4.33 | ppl 75.81 | epoch 25 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.31 | ppl 74.66 ----------------------------------------------------------------------------------------- | end of epoch 25 | time: 515.35s | valid loss 4.62 | valid ppl 101.54 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 26 | 200/ 1568 batches | lr 30.00 | ms/batch 306.02 | loss 4.35 | ppl 77.18 | epoch 26 | 400/ 1568 batches | lr 30.00 | ms/batch 305.03 | loss 4.35 | ppl 77.74 | epoch 26 | 600/ 1568 batches | lr 30.00 | ms/batch 301.88 | loss 4.32 | ppl 75.37 | epoch 26 | 800/ 1568 batches | lr 30.00 | ms/batch 303.43 | loss 4.35 | ppl 77.18 | epoch 26 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.02 | loss 4.37 | ppl 79.14 | epoch 26 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.76 | loss 4.34 | ppl 76.70 | epoch 26 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.34 | loss 4.33 | ppl 75.64 | epoch 26 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.38 | loss 4.30 | ppl 73.76 ----------------------------------------------------------------------------------------- | end of epoch 26 | time: 515.10s | valid loss 4.62 | valid ppl 101.44 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 27 | 200/ 1568 batches | lr 30.00 | ms/batch 300.96 | loss 4.35 | ppl 77.76 | epoch 27 | 400/ 1568 batches | lr 30.00 | ms/batch 300.03 | loss 4.34 | ppl 76.62 | epoch 27 | 600/ 1568 batches | lr 30.00 | ms/batch 304.00 | loss 4.33 | ppl 75.58 | epoch 27 | 800/ 1568 batches | lr 30.00 | ms/batch 307.47 | loss 4.34 | ppl 77.01 | epoch 27 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.37 | ppl 78.73 | epoch 27 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.70 | loss 4.32 | ppl 75.33 | epoch 27 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.85 | loss 4.32 | ppl 75.07 | epoch 27 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.30 | ppl 73.65 ----------------------------------------------------------------------------------------- | end of epoch 27 | time: 515.11s | valid loss 4.62 | valid ppl 101.35 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 28 | 200/ 1568 batches | lr 30.00 | ms/batch 305.95 | loss 4.35 | ppl 77.20 | epoch 28 | 400/ 1568 batches | lr 30.00 | ms/batch 300.67 | loss 4.33 | ppl 75.68 | epoch 28 | 600/ 1568 batches | lr 30.00 | ms/batch 303.50 | loss 4.31 | ppl 74.51 | epoch 28 | 800/ 1568 batches | lr 30.00 | ms/batch 301.83 | loss 4.34 | ppl 76.35 | epoch 28 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.13 | loss 4.39 | ppl 80.34 | epoch 28 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.31 | loss 4.32 | ppl 75.27 | epoch 28 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.01 | loss 4.32 | ppl 75.05 | epoch 28 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.40 | loss 4.32 | ppl 74.82 ----------------------------------------------------------------------------------------- | end of epoch 28 | time: 514.78s | valid loss 4.62 | valid ppl 101.26 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 29 | 200/ 1568 batches | lr 30.00 | ms/batch 306.33 | loss 4.34 | ppl 76.95 | epoch 29 | 400/ 1568 batches | lr 30.00 | ms/batch 301.71 | loss 4.35 | ppl 77.55 | epoch 29 | 600/ 1568 batches | lr 30.00 | ms/batch 300.67 | loss 4.31 | ppl 74.53 | epoch 29 | 800/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.33 | ppl 75.58 | epoch 29 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.19 | loss 4.37 | ppl 79.20 | epoch 29 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.30 | ppl 73.97 | epoch 29 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.39 | loss 4.32 | ppl 75.18 | epoch 29 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.89 | loss 4.32 | ppl 74.92 ----------------------------------------------------------------------------------------- | end of epoch 29 | time: 515.79s | valid loss 4.62 | valid ppl 101.18 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 30 | 200/ 1568 batches | lr 30.00 | ms/batch 307.00 | loss 4.35 | ppl 77.85 | epoch 30 | 400/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.34 | ppl 76.67 | epoch 30 | 600/ 1568 batches | lr 30.00 | ms/batch 300.08 | loss 4.31 | ppl 74.67 | epoch 30 | 800/ 1568 batches | lr 30.00 | ms/batch 303.85 | loss 4.32 | ppl 75.26 | epoch 30 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.89 | loss 4.36 | ppl 77.96 | epoch 30 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.09 | loss 4.33 | ppl 75.62 | epoch 30 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.09 | loss 4.32 | ppl 75.13 | epoch 30 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.30 | ppl 73.59 ----------------------------------------------------------------------------------------- | end of epoch 30 | time: 515.35s | valid loss 4.62 | valid ppl 101.09 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 31 | 200/ 1568 batches | lr 30.00 | ms/batch 301.96 | loss 4.35 | ppl 77.17 | epoch 31 | 400/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.33 | ppl 76.31 | epoch 31 | 600/ 1568 batches | lr 30.00 | ms/batch 301.45 | loss 4.31 | ppl 74.12 | epoch 31 | 800/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.32 | ppl 75.55 | epoch 31 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.94 | loss 4.37 | ppl 79.01 | epoch 31 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.88 | loss 4.32 | ppl 75.18 | epoch 31 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.95 | loss 4.30 | ppl 73.90 | epoch 31 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.29 | loss 4.29 | ppl 73.06 ----------------------------------------------------------------------------------------- | end of epoch 31 | time: 515.47s | valid loss 4.62 | valid ppl 101.02 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 32 | 200/ 1568 batches | lr 30.00 | ms/batch 303.40 | loss 4.34 | ppl 76.34 | epoch 32 | 400/ 1568 batches | lr 30.00 | ms/batch 297.21 | loss 4.33 | ppl 75.76 | epoch 32 | 600/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.31 | ppl 74.22 | epoch 32 | 800/ 1568 batches | lr 30.00 | ms/batch 301.83 | loss 4.31 | ppl 74.75 | epoch 32 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.10 | loss 4.39 | ppl 80.56 | epoch 32 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.31 | ppl 74.64 | epoch 32 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.57 | loss 4.30 | ppl 73.78 | epoch 32 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.11 | loss 4.29 | ppl 73.06 ----------------------------------------------------------------------------------------- | end of epoch 32 | time: 515.13s | valid loss 4.61 | valid ppl 100.96 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 33 | 200/ 1568 batches | lr 30.00 | ms/batch 302.64 | loss 4.33 | ppl 75.73 | epoch 33 | 400/ 1568 batches | lr 30.00 | ms/batch 297.34 | loss 4.34 | ppl 76.87 | epoch 33 | 600/ 1568 batches | lr 30.00 | ms/batch 299.01 | loss 4.29 | ppl 73.09 | epoch 33 | 800/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.30 | ppl 74.01 | epoch 33 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.22 | loss 4.38 | ppl 79.70 | epoch 33 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.77 | loss 4.31 | ppl 74.14 | epoch 33 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.70 | loss 4.29 | ppl 73.13 | epoch 33 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.31 | ppl 74.74 ----------------------------------------------------------------------------------------- | end of epoch 33 | time: 515.43s | valid loss 4.61 | valid ppl 100.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 34 | 200/ 1568 batches | lr 30.00 | ms/batch 303.87 | loss 4.34 | ppl 76.39 | epoch 34 | 400/ 1568 batches | lr 30.00 | ms/batch 303.71 | loss 4.34 | ppl 76.95 | epoch 34 | 600/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.29 | ppl 73.04 | epoch 34 | 800/ 1568 batches | lr 30.00 | ms/batch 297.37 | loss 4.30 | ppl 74.04 | epoch 34 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.53 | loss 4.37 | ppl 78.73 | epoch 34 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.93 | loss 4.30 | ppl 73.53 | epoch 34 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.00 | loss 4.29 | ppl 73.02 | epoch 34 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.52 | loss 4.28 | ppl 72.27 ----------------------------------------------------------------------------------------- | end of epoch 34 | time: 515.26s | valid loss 4.61 | valid ppl 100.84 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 35 | 200/ 1568 batches | lr 30.00 | ms/batch 304.63 | loss 4.34 | ppl 76.91 | epoch 35 | 400/ 1568 batches | lr 30.00 | ms/batch 304.88 | loss 4.32 | ppl 74.98 | epoch 35 | 600/ 1568 batches | lr 30.00 | ms/batch 298.76 | loss 4.31 | ppl 74.34 | epoch 35 | 800/ 1568 batches | lr 30.00 | ms/batch 303.60 | loss 4.30 | ppl 74.02 | epoch 35 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.55 | loss 4.37 | ppl 78.68 | epoch 35 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.84 | loss 4.30 | ppl 73.57 | epoch 35 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.26 | loss 4.30 | ppl 73.38 | epoch 35 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.06 | loss 4.28 | ppl 72.04 ----------------------------------------------------------------------------------------- | end of epoch 35 | time: 515.01s | valid loss 4.61 | valid ppl 100.79 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 36 | 200/ 1568 batches | lr 30.00 | ms/batch 308.38 | loss 4.35 | ppl 77.30 | epoch 36 | 400/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.33 | ppl 75.73 | epoch 36 | 600/ 1568 batches | lr 30.00 | ms/batch 304.64 | loss 4.28 | ppl 72.29 | epoch 36 | 800/ 1568 batches | lr 30.00 | ms/batch 306.19 | loss 4.31 | ppl 74.40 | epoch 36 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.87 | loss 4.35 | ppl 77.75 | epoch 36 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.26 | loss 4.31 | ppl 74.54 | epoch 36 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.31 | ppl 74.12 | epoch 36 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.21 | loss 4.28 | ppl 71.89 ----------------------------------------------------------------------------------------- | end of epoch 36 | time: 515.35s | valid loss 4.61 | valid ppl 100.75 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 37 | 200/ 1568 batches | lr 30.00 | ms/batch 295.63 | loss 4.33 | ppl 76.28 | epoch 37 | 400/ 1568 batches | lr 30.00 | ms/batch 302.40 | loss 4.32 | ppl 74.88 | epoch 37 | 600/ 1568 batches | lr 30.00 | ms/batch 299.91 | loss 4.27 | ppl 71.61 | epoch 37 | 800/ 1568 batches | lr 30.00 | ms/batch 305.86 | loss 4.29 | ppl 73.32 | epoch 37 | 1000/ 1568 batches | lr 30.00 | ms/batch 307.32 | loss 4.35 | ppl 77.70 | epoch 37 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.29 | loss 4.28 | ppl 72.17 | epoch 37 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.09 | loss 4.30 | ppl 73.72 | epoch 37 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.57 | loss 4.28 | ppl 72.35 ----------------------------------------------------------------------------------------- | end of epoch 37 | time: 515.13s | valid loss 4.61 | valid ppl 100.69 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 38 | 200/ 1568 batches | lr 30.00 | ms/batch 306.40 | loss 4.36 | ppl 77.95 | epoch 38 | 400/ 1568 batches | lr 30.00 | ms/batch 299.09 | loss 4.31 | ppl 74.18 | epoch 38 | 600/ 1568 batches | lr 30.00 | ms/batch 305.69 | loss 4.30 | ppl 73.79 | epoch 38 | 800/ 1568 batches | lr 30.00 | ms/batch 302.28 | loss 4.32 | ppl 75.42 | epoch 38 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.42 | loss 4.34 | ppl 76.82 | epoch 38 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.05 | loss 4.28 | ppl 72.40 | epoch 38 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.38 | loss 4.28 | ppl 72.58 | epoch 38 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.19 | loss 4.30 | ppl 73.69 ----------------------------------------------------------------------------------------- | end of epoch 38 | time: 515.01s | valid loss 4.61 | valid ppl 100.65 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 39 | 200/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.34 | ppl 76.51 | epoch 39 | 400/ 1568 batches | lr 30.00 | ms/batch 298.96 | loss 4.33 | ppl 75.63 | epoch 39 | 600/ 1568 batches | lr 30.00 | ms/batch 298.11 | loss 4.29 | ppl 72.70 | epoch 39 | 800/ 1568 batches | lr 30.00 | ms/batch 303.44 | loss 4.32 | ppl 75.39 | epoch 39 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.34 | ppl 76.69 | epoch 39 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.67 | loss 4.29 | ppl 73.28 | epoch 39 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.54 | loss 4.30 | ppl 73.43 | epoch 39 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.17 | loss 4.27 | ppl 71.68 ----------------------------------------------------------------------------------------- | end of epoch 39 | time: 514.68s | valid loss 4.61 | valid ppl 100.60 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 40 | 200/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.32 | ppl 75.16 | epoch 40 | 400/ 1568 batches | lr 30.00 | ms/batch 297.00 | loss 4.30 | ppl 73.93 | epoch 40 | 600/ 1568 batches | lr 30.00 | ms/batch 301.92 | loss 4.29 | ppl 72.83 | epoch 40 | 800/ 1568 batches | lr 30.00 | ms/batch 302.29 | loss 4.30 | ppl 73.91 | epoch 40 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.86 | loss 4.34 | ppl 77.06 | epoch 40 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.34 | loss 4.28 | ppl 72.31 | epoch 40 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.69 | loss 4.30 | ppl 73.72 | epoch 40 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.28 | loss 4.28 | ppl 72.02 ----------------------------------------------------------------------------------------- | end of epoch 40 | time: 514.47s | valid loss 4.61 | valid ppl 100.56 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 41 | 200/ 1568 batches | lr 30.00 | ms/batch 304.98 | loss 4.33 | ppl 76.16 | epoch 41 | 400/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.33 | ppl 76.28 | epoch 41 | 600/ 1568 batches | lr 30.00 | ms/batch 306.79 | loss 4.28 | ppl 72.53 | epoch 41 | 800/ 1568 batches | lr 30.00 | ms/batch 296.54 | loss 4.29 | ppl 72.86 | epoch 41 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.89 | loss 4.34 | ppl 76.78 | epoch 41 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.94 | loss 4.29 | ppl 73.19 | epoch 41 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.48 | loss 4.29 | ppl 72.75 | epoch 41 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.32 | loss 4.27 | ppl 71.62 ----------------------------------------------------------------------------------------- | end of epoch 41 | time: 514.90s | valid loss 4.61 | valid ppl 100.53 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 42 | 200/ 1568 batches | lr 30.00 | ms/batch 301.75 | loss 4.32 | ppl 75.27 | epoch 42 | 400/ 1568 batches | lr 30.00 | ms/batch 298.85 | loss 4.31 | ppl 74.31 | epoch 42 | 600/ 1568 batches | lr 30.00 | ms/batch 296.55 | loss 4.30 | ppl 73.61 | epoch 42 | 800/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.29 | ppl 72.97 | epoch 42 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.87 | loss 4.36 | ppl 78.50 | epoch 42 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.56 | loss 4.28 | ppl 72.04 | epoch 42 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.28 | ppl 72.11 | epoch 42 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.15 | loss 4.27 | ppl 71.62 ----------------------------------------------------------------------------------------- | end of epoch 42 | time: 515.31s | valid loss 4.61 | valid ppl 100.50 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 43 | 200/ 1568 batches | lr 30.00 | ms/batch 301.47 | loss 4.33 | ppl 76.24 | epoch 43 | 400/ 1568 batches | lr 30.00 | ms/batch 300.62 | loss 4.32 | ppl 74.88 | epoch 43 | 600/ 1568 batches | lr 30.00 | ms/batch 303.66 | loss 4.29 | ppl 72.85 | epoch 43 | 800/ 1568 batches | lr 30.00 | ms/batch 299.71 | loss 4.30 | ppl 73.93 | epoch 43 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.78 | loss 4.34 | ppl 76.81 | epoch 43 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.29 | ppl 73.31 | epoch 43 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.83 | loss 4.28 | ppl 72.15 | epoch 43 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.57 | loss 4.28 | ppl 72.05 ----------------------------------------------------------------------------------------- | end of epoch 43 | time: 514.67s | valid loss 4.61 | valid ppl 100.46 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 44 | 200/ 1568 batches | lr 30.00 | ms/batch 300.72 | loss 4.32 | ppl 75.52 | epoch 44 | 400/ 1568 batches | lr 30.00 | ms/batch 294.21 | loss 4.31 | ppl 74.20 | epoch 44 | 600/ 1568 batches | lr 30.00 | ms/batch 298.22 | loss 4.29 | ppl 72.83 | epoch 44 | 800/ 1568 batches | lr 30.00 | ms/batch 307.52 | loss 4.29 | ppl 72.92 | epoch 44 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.76 | loss 4.35 | ppl 77.40 | epoch 44 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.52 | loss 4.28 | ppl 72.58 | epoch 44 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.40 | loss 4.29 | ppl 72.70 | epoch 44 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.17 | loss 4.28 | ppl 72.26 ----------------------------------------------------------------------------------------- | end of epoch 44 | time: 515.43s | valid loss 4.61 | valid ppl 100.43 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 45 | 200/ 1568 batches | lr 30.00 | ms/batch 307.00 | loss 4.33 | ppl 75.60 | epoch 45 | 400/ 1568 batches | lr 30.00 | ms/batch 302.97 | loss 4.31 | ppl 74.38 | epoch 45 | 600/ 1568 batches | lr 30.00 | ms/batch 300.29 | loss 4.29 | ppl 72.68 | epoch 45 | 800/ 1568 batches | lr 30.00 | ms/batch 304.61 | loss 4.30 | ppl 73.41 | epoch 45 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.54 | loss 4.34 | ppl 76.73 | epoch 45 | 1200/ 1568 batches | lr 30.00 | ms/batch 295.15 | loss 4.26 | ppl 70.84 | epoch 45 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.05 | loss 4.27 | ppl 71.72 | epoch 45 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.80 | loss 4.24 | ppl 69.68 ----------------------------------------------------------------------------------------- | end of epoch 45 | time: 515.68s | valid loss 4.61 | valid ppl 100.40 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 46 | 200/ 1568 batches | lr 30.00 | ms/batch 310.64 | loss 4.33 | ppl 75.57 | epoch 46 | 400/ 1568 batches | lr 30.00 | ms/batch 298.69 | loss 4.28 | ppl 72.25 | epoch 46 | 600/ 1568 batches | lr 30.00 | ms/batch 300.61 | loss 4.30 | ppl 73.42 | epoch 46 | 800/ 1568 batches | lr 30.00 | ms/batch 305.36 | loss 4.29 | ppl 72.91 | epoch 46 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.25 | loss 4.34 | ppl 76.58 | epoch 46 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.96 | loss 4.27 | ppl 71.60 | epoch 46 | 1400/ 1568 batches | lr 30.00 | ms/batch 306.81 | loss 4.27 | ppl 71.47 | epoch 46 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.78 | loss 4.26 | ppl 70.53 ----------------------------------------------------------------------------------------- | end of epoch 46 | time: 516.22s | valid loss 4.61 | valid ppl 100.37 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 47 | 200/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.33 | ppl 75.61 | epoch 47 | 400/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.31 | ppl 74.13 | epoch 47 | 600/ 1568 batches | lr 30.00 | ms/batch 302.60 | loss 4.27 | ppl 71.72 | epoch 47 | 800/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.29 | ppl 72.66 | epoch 47 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.33 | ppl 76.27 | epoch 47 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.17 | loss 4.28 | ppl 72.18 | epoch 47 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.39 | loss 4.29 | ppl 72.95 | epoch 47 | 1600/ 1568 batches | lr 30.00 | ms/batch 305.14 | loss 4.26 | ppl 70.53 ----------------------------------------------------------------------------------------- | end of epoch 47 | time: 516.02s | valid loss 4.61 | valid ppl 100.34 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 48 | 200/ 1568 batches | lr 30.00 | ms/batch 301.62 | loss 4.32 | ppl 75.11 | epoch 48 | 400/ 1568 batches | lr 30.00 | ms/batch 303.44 | loss 4.31 | ppl 74.09 | epoch 48 | 600/ 1568 batches | lr 30.00 | ms/batch 300.53 | loss 4.27 | ppl 71.30 | epoch 48 | 800/ 1568 batches | lr 30.00 | ms/batch 303.34 | loss 4.27 | ppl 71.78 | epoch 48 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.75 | loss 4.34 | ppl 77.07 | epoch 48 | 1200/ 1568 batches | lr 30.00 | ms/batch 306.02 | loss 4.29 | ppl 73.32 | epoch 48 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.98 | loss 4.28 | ppl 72.07 | epoch 48 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.52 | loss 4.25 | ppl 70.44 ----------------------------------------------------------------------------------------- | end of epoch 48 | time: 516.09s | valid loss 4.61 | valid ppl 100.30 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 49 | 200/ 1568 batches | lr 30.00 | ms/batch 299.79 | loss 4.31 | ppl 74.25 | epoch 49 | 400/ 1568 batches | lr 30.00 | ms/batch 302.79 | loss 4.29 | ppl 72.79 | epoch 49 | 600/ 1568 batches | lr 30.00 | ms/batch 302.32 | loss 4.28 | ppl 72.04 | epoch 49 | 800/ 1568 batches | lr 30.00 | ms/batch 302.84 | loss 4.27 | ppl 71.76 | epoch 49 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.07 | loss 4.32 | ppl 75.16 | epoch 49 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.00 | loss 4.27 | ppl 71.88 | epoch 49 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.06 | loss 4.27 | ppl 71.64 | epoch 49 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.19 | loss 4.27 | ppl 71.64 ----------------------------------------------------------------------------------------- | end of epoch 49 | time: 516.04s | valid loss 4.61 | valid ppl 100.27 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 50 | 200/ 1568 batches | lr 30.00 | ms/batch 305.82 | loss 4.33 | ppl 75.65 | epoch 50 | 400/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.29 | ppl 73.07 | epoch 50 | 600/ 1568 batches | lr 30.00 | ms/batch 303.34 | loss 4.25 | ppl 70.26 | epoch 50 | 800/ 1568 batches | lr 30.00 | ms/batch 306.20 | loss 4.29 | ppl 73.03 | epoch 50 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.34 | loss 4.32 | ppl 75.12 | epoch 50 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.26 | ppl 71.09 | epoch 50 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.22 | loss 4.26 | ppl 71.00 | epoch 50 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.94 | loss 4.26 | ppl 71.06 ----------------------------------------------------------------------------------------- | end of epoch 50 | time: 516.44s | valid loss 4.61 | valid ppl 100.25 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 51 | 200/ 1568 batches | lr 30.00 | ms/batch 304.81 | loss 4.32 | ppl 74.92 | epoch 51 | 400/ 1568 batches | lr 30.00 | ms/batch 299.76 | loss 4.29 | ppl 72.77 | epoch 51 | 600/ 1568 batches | lr 30.00 | ms/batch 303.23 | loss 4.26 | ppl 70.94 | epoch 51 | 800/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.29 | ppl 72.96 | epoch 51 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.08 | loss 4.33 | ppl 75.71 | epoch 51 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.49 | loss 4.27 | ppl 71.60 | epoch 51 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.87 | loss 4.27 | ppl 71.50 | epoch 51 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.66 | loss 4.24 | ppl 69.30 ----------------------------------------------------------------------------------------- | end of epoch 51 | time: 516.05s | valid loss 4.61 | valid ppl 100.23 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 52 | 200/ 1568 batches | lr 30.00 | ms/batch 305.87 | loss 4.31 | ppl 74.56 | epoch 52 | 400/ 1568 batches | lr 30.00 | ms/batch 298.03 | loss 4.29 | ppl 72.74 | epoch 52 | 600/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.29 | ppl 72.65 | epoch 52 | 800/ 1568 batches | lr 30.00 | ms/batch 299.58 | loss 4.27 | ppl 71.81 | epoch 52 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.91 | loss 4.32 | ppl 75.47 | epoch 52 | 1200/ 1568 batches | lr 30.00 | ms/batch 304.07 | loss 4.28 | ppl 72.03 | epoch 52 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.27 | ppl 71.39 | epoch 52 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.95 | loss 4.25 | ppl 70.07 ----------------------------------------------------------------------------------------- | end of epoch 52 | time: 516.20s | valid loss 4.61 | valid ppl 100.21 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 53 | 200/ 1568 batches | lr 30.00 | ms/batch 308.20 | loss 4.33 | ppl 76.26 | epoch 53 | 400/ 1568 batches | lr 30.00 | ms/batch 299.80 | loss 4.30 | ppl 73.51 | epoch 53 | 600/ 1568 batches | lr 30.00 | ms/batch 296.26 | loss 4.26 | ppl 71.06 | epoch 53 | 800/ 1568 batches | lr 30.00 | ms/batch 303.48 | loss 4.26 | ppl 71.14 | epoch 53 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.46 | loss 4.33 | ppl 75.88 | epoch 53 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.36 | loss 4.27 | ppl 71.74 | epoch 53 | 1400/ 1568 batches | lr 30.00 | ms/batch 307.39 | loss 4.27 | ppl 71.76 | epoch 53 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.43 | loss 4.24 | ppl 69.16 ----------------------------------------------------------------------------------------- | end of epoch 53 | time: 516.66s | valid loss 4.61 | valid ppl 100.18 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 54 | 200/ 1568 batches | lr 30.00 | ms/batch 302.98 | loss 4.29 | ppl 73.16 | epoch 54 | 400/ 1568 batches | lr 30.00 | ms/batch 300.96 | loss 4.29 | ppl 72.65 | epoch 54 | 600/ 1568 batches | lr 30.00 | ms/batch 300.23 | loss 4.25 | ppl 70.27 | epoch 54 | 800/ 1568 batches | lr 30.00 | ms/batch 306.98 | loss 4.28 | ppl 71.98 | epoch 54 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.50 | loss 4.32 | ppl 75.50 | epoch 54 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.63 | loss 4.28 | ppl 71.99 | epoch 54 | 1400/ 1568 batches | lr 30.00 | ms/batch 305.32 | loss 4.27 | ppl 71.87 | epoch 54 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.09 | loss 4.26 | ppl 70.49 ----------------------------------------------------------------------------------------- | end of epoch 54 | time: 516.29s | valid loss 4.61 | valid ppl 100.16 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 55 | 200/ 1568 batches | lr 30.00 | ms/batch 305.22 | loss 4.32 | ppl 75.21 | epoch 55 | 400/ 1568 batches | lr 30.00 | ms/batch 294.55 | loss 4.28 | ppl 72.20 | epoch 55 | 600/ 1568 batches | lr 30.00 | ms/batch 299.53 | loss 4.27 | ppl 71.60 | epoch 55 | 800/ 1568 batches | lr 30.00 | ms/batch 300.89 | loss 4.27 | ppl 71.27 | epoch 55 | 1000/ 1568 batches | lr 30.00 | ms/batch 296.18 | loss 4.33 | ppl 75.72 | epoch 55 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.42 | loss 4.26 | ppl 70.60 | epoch 55 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.24 | ppl 69.41 | epoch 55 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.33 | loss 4.26 | ppl 70.97 ----------------------------------------------------------------------------------------- | end of epoch 55 | time: 516.48s | valid loss 4.61 | valid ppl 100.14 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 56 | 200/ 1568 batches | lr 30.00 | ms/batch 308.10 | loss 4.31 | ppl 74.76 | epoch 56 | 400/ 1568 batches | lr 30.00 | ms/batch 303.94 | loss 4.28 | ppl 72.13 | epoch 56 | 600/ 1568 batches | lr 30.00 | ms/batch 302.54 | loss 4.26 | ppl 71.04 | epoch 56 | 800/ 1568 batches | lr 30.00 | ms/batch 301.73 | loss 4.28 | ppl 72.08 | epoch 56 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.01 | loss 4.30 | ppl 73.74 | epoch 56 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.80 | loss 4.26 | ppl 70.60 | epoch 56 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.61 | loss 4.27 | ppl 71.40 | epoch 56 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.24 | loss 4.24 | ppl 69.37 ----------------------------------------------------------------------------------------- | end of epoch 56 | time: 515.67s | valid loss 4.61 | valid ppl 100.13 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 57 | 200/ 1568 batches | lr 30.00 | ms/batch 307.03 | loss 4.31 | ppl 74.68 | epoch 57 | 400/ 1568 batches | lr 30.00 | ms/batch 303.73 | loss 4.30 | ppl 73.41 | epoch 57 | 600/ 1568 batches | lr 30.00 | ms/batch 307.58 | loss 4.26 | ppl 70.71 | epoch 57 | 800/ 1568 batches | lr 30.00 | ms/batch 303.82 | loss 4.29 | ppl 72.69 | epoch 57 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.52 | loss 4.33 | ppl 76.07 | epoch 57 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.33 | loss 4.30 | ppl 73.42 | epoch 57 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.25 | ppl 70.25 | epoch 57 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.23 | ppl 68.40 ----------------------------------------------------------------------------------------- | end of epoch 57 | time: 515.81s | valid loss 4.61 | valid ppl 100.11 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 58 | 200/ 1568 batches | lr 30.00 | ms/batch 301.62 | loss 4.31 | ppl 74.45 | epoch 58 | 400/ 1568 batches | lr 30.00 | ms/batch 298.84 | loss 4.27 | ppl 71.73 | epoch 58 | 600/ 1568 batches | lr 30.00 | ms/batch 300.45 | loss 4.26 | ppl 71.11 | epoch 58 | 800/ 1568 batches | lr 30.00 | ms/batch 307.31 | loss 4.28 | ppl 72.04 | epoch 58 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.35 | loss 4.32 | ppl 75.32 | epoch 58 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.76 | loss 4.27 | ppl 71.86 | epoch 58 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.92 | loss 4.25 | ppl 70.43 | epoch 58 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.31 | loss 4.25 | ppl 70.33 ----------------------------------------------------------------------------------------- | end of epoch 58 | time: 516.57s | valid loss 4.61 | valid ppl 100.10 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 59 | 200/ 1568 batches | lr 30.00 | ms/batch 304.20 | loss 4.30 | ppl 73.54 | epoch 59 | 400/ 1568 batches | lr 30.00 | ms/batch 298.09 | loss 4.29 | ppl 72.89 | epoch 59 | 600/ 1568 batches | lr 30.00 | ms/batch 305.08 | loss 4.26 | ppl 70.51 | epoch 59 | 800/ 1568 batches | lr 30.00 | ms/batch 303.66 | loss 4.27 | ppl 71.18 | epoch 59 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.16 | loss 4.30 | ppl 73.79 | epoch 59 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.47 | loss 4.29 | ppl 72.70 | epoch 59 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.55 | loss 4.27 | ppl 71.31 | epoch 59 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.29 | loss 4.24 | ppl 69.17 ----------------------------------------------------------------------------------------- | end of epoch 59 | time: 515.16s | valid loss 4.61 | valid ppl 100.08 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 60 | 200/ 1568 batches | lr 30.00 | ms/batch 305.25 | loss 4.31 | ppl 74.31 | epoch 60 | 400/ 1568 batches | lr 30.00 | ms/batch 303.71 | loss 4.30 | ppl 73.49 | epoch 60 | 600/ 1568 batches | lr 30.00 | ms/batch 300.37 | loss 4.25 | ppl 70.45 | epoch 60 | 800/ 1568 batches | lr 30.00 | ms/batch 306.76 | loss 4.28 | ppl 72.10 | epoch 60 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.32 | ppl 75.35 | epoch 60 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.75 | loss 4.27 | ppl 71.21 | epoch 60 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.57 | loss 4.26 | ppl 70.72 | epoch 60 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.86 | loss 4.24 | ppl 69.19 ----------------------------------------------------------------------------------------- | end of epoch 60 | time: 515.51s | valid loss 4.61 | valid ppl 100.06 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 61 | 200/ 1568 batches | lr 30.00 | ms/batch 302.99 | loss 4.31 | ppl 74.28 | epoch 61 | 400/ 1568 batches | lr 30.00 | ms/batch 295.07 | loss 4.28 | ppl 72.10 | epoch 61 | 600/ 1568 batches | lr 30.00 | ms/batch 303.64 | loss 4.27 | ppl 71.33 | epoch 61 | 800/ 1568 batches | lr 30.00 | ms/batch 303.86 | loss 4.27 | ppl 71.36 | epoch 61 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.95 | loss 4.30 | ppl 73.52 | epoch 61 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.26 | ppl 71.12 | epoch 61 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.95 | loss 4.26 | ppl 70.87 | epoch 61 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.70 | loss 4.24 | ppl 69.32 ----------------------------------------------------------------------------------------- | end of epoch 61 | time: 514.66s | valid loss 4.61 | valid ppl 100.04 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 62 | 200/ 1568 batches | lr 30.00 | ms/batch 303.72 | loss 4.29 | ppl 72.81 | epoch 62 | 400/ 1568 batches | lr 30.00 | ms/batch 299.22 | loss 4.28 | ppl 72.39 | epoch 62 | 600/ 1568 batches | lr 30.00 | ms/batch 304.29 | loss 4.23 | ppl 68.93 | epoch 62 | 800/ 1568 batches | lr 30.00 | ms/batch 303.63 | loss 4.26 | ppl 70.78 | epoch 62 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.78 | loss 4.31 | ppl 74.07 | epoch 62 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.11 | loss 4.26 | ppl 70.73 | epoch 62 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.06 | loss 4.26 | ppl 70.49 | epoch 62 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.38 | loss 4.24 | ppl 69.28 ----------------------------------------------------------------------------------------- | end of epoch 62 | time: 516.88s | valid loss 4.61 | valid ppl 100.04 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 63 | 200/ 1568 batches | lr 30.00 | ms/batch 307.35 | loss 4.30 | ppl 73.45 | epoch 63 | 400/ 1568 batches | lr 30.00 | ms/batch 297.71 | loss 4.29 | ppl 72.90 | epoch 63 | 600/ 1568 batches | lr 30.00 | ms/batch 300.15 | loss 4.23 | ppl 68.61 | epoch 63 | 800/ 1568 batches | lr 30.00 | ms/batch 302.87 | loss 4.24 | ppl 69.65 | epoch 63 | 1000/ 1568 batches | lr 30.00 | ms/batch 306.12 | loss 4.32 | ppl 74.89 | epoch 63 | 1200/ 1568 batches | lr 30.00 | ms/batch 305.78 | loss 4.27 | ppl 71.31 | epoch 63 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.03 | loss 4.23 | ppl 69.02 | epoch 63 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.31 | loss 4.22 | ppl 68.08 ----------------------------------------------------------------------------------------- | end of epoch 63 | time: 515.47s | valid loss 4.61 | valid ppl 100.03 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 64 | 200/ 1568 batches | lr 30.00 | ms/batch 303.63 | loss 4.30 | ppl 73.87 | epoch 64 | 400/ 1568 batches | lr 30.00 | ms/batch 300.61 | loss 4.28 | ppl 72.27 | epoch 64 | 600/ 1568 batches | lr 30.00 | ms/batch 304.11 | loss 4.25 | ppl 70.39 | epoch 64 | 800/ 1568 batches | lr 30.00 | ms/batch 302.24 | loss 4.27 | ppl 71.20 | epoch 64 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.88 | loss 4.32 | ppl 75.32 | epoch 64 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.80 | loss 4.26 | ppl 71.01 | epoch 64 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.30 | loss 4.24 | ppl 69.47 | epoch 64 | 1600/ 1568 batches | lr 30.00 | ms/batch 306.80 | loss 4.24 | ppl 69.43 ----------------------------------------------------------------------------------------- | end of epoch 64 | time: 513.54s | valid loss 4.61 | valid ppl 100.02 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 65 | 200/ 1568 batches | lr 30.00 | ms/batch 303.42 | loss 4.31 | ppl 74.11 | epoch 65 | 400/ 1568 batches | lr 30.00 | ms/batch 295.42 | loss 4.27 | ppl 71.85 | epoch 65 | 600/ 1568 batches | lr 30.00 | ms/batch 302.30 | loss 4.25 | ppl 69.99 | epoch 65 | 800/ 1568 batches | lr 30.00 | ms/batch 304.99 | loss 4.25 | ppl 69.96 | epoch 65 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.32 | loss 4.32 | ppl 75.18 | epoch 65 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.21 | loss 4.27 | ppl 71.49 | epoch 65 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.62 | loss 4.23 | ppl 69.06 | epoch 65 | 1600/ 1568 batches | lr 30.00 | ms/batch 295.99 | loss 4.21 | ppl 67.35 ----------------------------------------------------------------------------------------- | end of epoch 65 | time: 514.57s | valid loss 4.61 | valid ppl 100.01 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 66 | 200/ 1568 batches | lr 30.00 | ms/batch 300.12 | loss 4.31 | ppl 74.27 | epoch 66 | 400/ 1568 batches | lr 30.00 | ms/batch 299.01 | loss 4.28 | ppl 72.35 | epoch 66 | 600/ 1568 batches | lr 30.00 | ms/batch 295.72 | loss 4.24 | ppl 69.49 | epoch 66 | 800/ 1568 batches | lr 30.00 | ms/batch 302.70 | loss 4.25 | ppl 70.01 | epoch 66 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.00 | loss 4.32 | ppl 75.22 | epoch 66 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.38 | loss 4.25 | ppl 70.33 | epoch 66 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.66 | loss 4.25 | ppl 69.97 | epoch 66 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.98 | loss 4.25 | ppl 69.98 ----------------------------------------------------------------------------------------- | end of epoch 66 | time: 514.08s | valid loss 4.61 | valid ppl 100.00 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 67 | 200/ 1568 batches | lr 30.00 | ms/batch 302.34 | loss 4.30 | ppl 73.95 | epoch 67 | 400/ 1568 batches | lr 30.00 | ms/batch 294.87 | loss 4.27 | ppl 71.61 | epoch 67 | 600/ 1568 batches | lr 30.00 | ms/batch 301.27 | loss 4.24 | ppl 69.57 | epoch 67 | 800/ 1568 batches | lr 30.00 | ms/batch 301.44 | loss 4.25 | ppl 69.87 | epoch 67 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.80 | loss 4.33 | ppl 75.87 | epoch 67 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.65 | loss 4.25 | ppl 69.92 | epoch 67 | 1400/ 1568 batches | lr 30.00 | ms/batch 306.19 | loss 4.24 | ppl 69.61 | epoch 67 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.01 | loss 4.23 | ppl 68.65 ----------------------------------------------------------------------------------------- | end of epoch 67 | time: 513.56s | valid loss 4.61 | valid ppl 100.00 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 68 | 200/ 1568 batches | lr 30.00 | ms/batch 299.76 | loss 4.29 | ppl 72.80 | epoch 68 | 400/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.28 | ppl 72.19 | epoch 68 | 600/ 1568 batches | lr 30.00 | ms/batch 301.53 | loss 4.25 | ppl 70.45 | epoch 68 | 800/ 1568 batches | lr 30.00 | ms/batch 296.72 | loss 4.24 | ppl 69.68 | epoch 68 | 1000/ 1568 batches | lr 30.00 | ms/batch 307.74 | loss 4.31 | ppl 74.25 | epoch 68 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 4.26 | ppl 70.51 | epoch 68 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.61 | loss 4.25 | ppl 70.15 | epoch 68 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.69 | loss 4.23 | ppl 68.61 ----------------------------------------------------------------------------------------- | end of epoch 68 | time: 514.24s | valid loss 4.61 | valid ppl 99.99 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 69 | 200/ 1568 batches | lr 30.00 | ms/batch 301.06 | loss 4.29 | ppl 72.68 | epoch 69 | 400/ 1568 batches | lr 30.00 | ms/batch 302.28 | loss 4.28 | ppl 72.26 | epoch 69 | 600/ 1568 batches | lr 30.00 | ms/batch 301.47 | loss 4.25 | ppl 70.23 | epoch 69 | 800/ 1568 batches | lr 30.00 | ms/batch 302.40 | loss 4.25 | ppl 69.79 | epoch 69 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.59 | loss 4.31 | ppl 74.30 | epoch 69 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.78 | loss 4.24 | ppl 69.25 | epoch 69 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.62 | loss 4.25 | ppl 70.27 | epoch 69 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.48 | loss 4.23 | ppl 68.85 ----------------------------------------------------------------------------------------- | end of epoch 69 | time: 514.69s | valid loss 4.60 | valid ppl 99.98 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 70 | 200/ 1568 batches | lr 30.00 | ms/batch 301.62 | loss 4.28 | ppl 72.60 | epoch 70 | 400/ 1568 batches | lr 30.00 | ms/batch 300.77 | loss 4.29 | ppl 72.61 | epoch 70 | 600/ 1568 batches | lr 30.00 | ms/batch 301.22 | loss 4.23 | ppl 68.44 | epoch 70 | 800/ 1568 batches | lr 30.00 | ms/batch 298.61 | loss 4.25 | ppl 69.83 | epoch 70 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.51 | loss 4.30 | ppl 73.45 | epoch 70 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.69 | loss 4.24 | ppl 69.58 | epoch 70 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.08 | loss 4.24 | ppl 69.71 | epoch 70 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.34 | loss 4.23 | ppl 68.67 ----------------------------------------------------------------------------------------- | end of epoch 70 | time: 513.39s | valid loss 4.60 | valid ppl 99.97 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 71 | 200/ 1568 batches | lr 30.00 | ms/batch 306.68 | loss 4.30 | ppl 73.43 | epoch 71 | 400/ 1568 batches | lr 30.00 | ms/batch 297.80 | loss 4.27 | ppl 71.69 | epoch 71 | 600/ 1568 batches | lr 30.00 | ms/batch 301.93 | loss 4.24 | ppl 69.11 | epoch 71 | 800/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.25 | ppl 69.82 | epoch 71 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.67 | loss 4.29 | ppl 73.26 | epoch 71 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.31 | loss 4.25 | ppl 70.23 | epoch 71 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.04 | loss 4.24 | ppl 69.11 | epoch 71 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.60 | loss 4.23 | ppl 68.92 ----------------------------------------------------------------------------------------- | end of epoch 71 | time: 513.12s | valid loss 4.60 | valid ppl 99.96 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 72 | 200/ 1568 batches | lr 30.00 | ms/batch 304.92 | loss 4.30 | ppl 73.77 | epoch 72 | 400/ 1568 batches | lr 30.00 | ms/batch 297.69 | loss 4.28 | ppl 72.37 | epoch 72 | 600/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.23 | ppl 68.54 | epoch 72 | 800/ 1568 batches | lr 30.00 | ms/batch 299.87 | loss 4.24 | ppl 69.41 | epoch 72 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.53 | loss 4.29 | ppl 72.71 | epoch 72 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.99 | loss 4.24 | ppl 69.40 | epoch 72 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.09 | loss 4.25 | ppl 70.02 | epoch 72 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.01 | loss 4.23 | ppl 68.87 ----------------------------------------------------------------------------------------- | end of epoch 72 | time: 513.20s | valid loss 4.60 | valid ppl 99.96 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 73 | 200/ 1568 batches | lr 30.00 | ms/batch 301.72 | loss 4.31 | ppl 74.17 | epoch 73 | 400/ 1568 batches | lr 30.00 | ms/batch 304.79 | loss 4.27 | ppl 71.46 | epoch 73 | 600/ 1568 batches | lr 30.00 | ms/batch 298.78 | loss 4.24 | ppl 69.21 | epoch 73 | 800/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.24 | ppl 69.74 | epoch 73 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.10 | loss 4.29 | ppl 72.88 | epoch 73 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.79 | loss 4.25 | ppl 70.36 | epoch 73 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.05 | loss 4.25 | ppl 69.97 | epoch 73 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.85 | loss 4.21 | ppl 67.40 ----------------------------------------------------------------------------------------- | end of epoch 73 | time: 514.31s | valid loss 4.60 | valid ppl 99.95 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 74 | 200/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.30 | ppl 73.41 | epoch 74 | 400/ 1568 batches | lr 30.00 | ms/batch 300.30 | loss 4.27 | ppl 71.27 | epoch 74 | 600/ 1568 batches | lr 30.00 | ms/batch 299.58 | loss 4.23 | ppl 68.61 | epoch 74 | 800/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.23 | ppl 68.98 | epoch 74 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.89 | loss 4.30 | ppl 73.94 | epoch 74 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.55 | loss 4.25 | ppl 70.29 | epoch 74 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.20 | loss 4.24 | ppl 69.55 | epoch 74 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.21 | ppl 67.54 ----------------------------------------------------------------------------------------- | end of epoch 74 | time: 513.93s | valid loss 4.60 | valid ppl 99.95 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 75 | 200/ 1568 batches | lr 30.00 | ms/batch 300.39 | loss 4.30 | ppl 73.67 | epoch 75 | 400/ 1568 batches | lr 30.00 | ms/batch 296.59 | loss 4.25 | ppl 69.79 | epoch 75 | 600/ 1568 batches | lr 30.00 | ms/batch 301.78 | loss 4.23 | ppl 68.74 | epoch 75 | 800/ 1568 batches | lr 30.00 | ms/batch 304.16 | loss 4.25 | ppl 70.11 | epoch 75 | 1000/ 1568 batches | lr 30.00 | ms/batch 294.20 | loss 4.31 | ppl 74.52 | epoch 75 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.81 | loss 4.23 | ppl 68.87 | epoch 75 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.88 | loss 4.23 | ppl 68.41 | epoch 75 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.05 | loss 4.22 | ppl 67.82 ----------------------------------------------------------------------------------------- | end of epoch 75 | time: 513.41s | valid loss 4.60 | valid ppl 99.94 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 76 | 200/ 1568 batches | lr 30.00 | ms/batch 304.49 | loss 4.29 | ppl 73.30 | epoch 76 | 400/ 1568 batches | lr 30.00 | ms/batch 301.76 | loss 4.26 | ppl 70.60 | epoch 76 | 600/ 1568 batches | lr 30.00 | ms/batch 302.91 | loss 4.23 | ppl 68.61 | epoch 76 | 800/ 1568 batches | lr 30.00 | ms/batch 300.98 | loss 4.25 | ppl 69.97 | epoch 76 | 1000/ 1568 batches | lr 30.00 | ms/batch 304.75 | loss 4.28 | ppl 72.39 | epoch 76 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.07 | loss 4.25 | ppl 70.28 | epoch 76 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.52 | loss 4.25 | ppl 70.10 | epoch 76 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.00 | loss 4.23 | ppl 69.00 ----------------------------------------------------------------------------------------- | end of epoch 76 | time: 513.57s | valid loss 4.60 | valid ppl 99.93 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 77 | 200/ 1568 batches | lr 30.00 | ms/batch 300.38 | loss 4.28 | ppl 72.59 | epoch 77 | 400/ 1568 batches | lr 30.00 | ms/batch 298.35 | loss 4.27 | ppl 71.26 | epoch 77 | 600/ 1568 batches | lr 30.00 | ms/batch 299.29 | loss 4.22 | ppl 67.76 | epoch 77 | 800/ 1568 batches | lr 30.00 | ms/batch 297.13 | loss 4.25 | ppl 70.09 | epoch 77 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.85 | loss 4.31 | ppl 74.52 | epoch 77 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.56 | loss 4.24 | ppl 69.70 | epoch 77 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.24 | ppl 69.25 | epoch 77 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.65 | loss 4.22 | ppl 67.92 ----------------------------------------------------------------------------------------- | end of epoch 77 | time: 513.43s | valid loss 4.60 | valid ppl 99.93 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 78 | 200/ 1568 batches | lr 30.00 | ms/batch 303.14 | loss 4.29 | ppl 72.88 | epoch 78 | 400/ 1568 batches | lr 30.00 | ms/batch 301.47 | loss 4.27 | ppl 71.33 | epoch 78 | 600/ 1568 batches | lr 30.00 | ms/batch 301.04 | loss 4.24 | ppl 69.20 | epoch 78 | 800/ 1568 batches | lr 30.00 | ms/batch 298.61 | loss 4.25 | ppl 69.76 | epoch 78 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.90 | loss 4.29 | ppl 72.88 | epoch 78 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.90 | loss 4.24 | ppl 69.52 | epoch 78 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.56 | loss 4.24 | ppl 69.28 | epoch 78 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.29 | loss 4.22 | ppl 68.25 ----------------------------------------------------------------------------------------- | end of epoch 78 | time: 512.55s | valid loss 4.60 | valid ppl 99.93 ----------------------------------------------------------------------------------------- | epoch 79 | 200/ 1568 batches | lr 30.00 | ms/batch 307.44 | loss 4.27 | ppl 71.84 | epoch 79 | 400/ 1568 batches | lr 30.00 | ms/batch 303.59 | loss 4.26 | ppl 71.08 | epoch 79 | 600/ 1568 batches | lr 30.00 | ms/batch 301.58 | loss 4.23 | ppl 68.75 | epoch 79 | 800/ 1568 batches | lr 30.00 | ms/batch 302.85 | loss 4.24 | ppl 69.57 | epoch 79 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.91 | loss 4.29 | ppl 72.90 | epoch 79 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.61 | loss 4.26 | ppl 70.84 | epoch 79 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.63 | loss 4.23 | ppl 68.71 | epoch 79 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.51 | loss 4.22 | ppl 68.16 ----------------------------------------------------------------------------------------- | end of epoch 79 | time: 512.95s | valid loss 4.60 | valid ppl 99.93 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 80 | 200/ 1568 batches | lr 30.00 | ms/batch 303.87 | loss 4.27 | ppl 71.71 | epoch 80 | 400/ 1568 batches | lr 30.00 | ms/batch 298.28 | loss 4.25 | ppl 70.00 | epoch 80 | 600/ 1568 batches | lr 30.00 | ms/batch 303.15 | loss 4.20 | ppl 66.95 | epoch 80 | 800/ 1568 batches | lr 30.00 | ms/batch 302.85 | loss 4.25 | ppl 70.26 | epoch 80 | 1000/ 1568 batches | lr 30.00 | ms/batch 294.84 | loss 4.29 | ppl 72.95 | epoch 80 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.23 | ppl 68.96 | epoch 80 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.00 | loss 4.23 | ppl 68.60 | epoch 80 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.92 | loss 4.21 | ppl 67.37 ----------------------------------------------------------------------------------------- | end of epoch 80 | time: 513.59s | valid loss 4.60 | valid ppl 99.92 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 81 | 200/ 1568 batches | lr 30.00 | ms/batch 305.85 | loss 4.28 | ppl 71.95 | epoch 81 | 400/ 1568 batches | lr 30.00 | ms/batch 298.75 | loss 4.26 | ppl 71.05 | epoch 81 | 600/ 1568 batches | lr 30.00 | ms/batch 302.45 | loss 4.23 | ppl 68.62 | epoch 81 | 800/ 1568 batches | lr 30.00 | ms/batch 303.43 | loss 4.26 | ppl 70.79 | epoch 81 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.08 | loss 4.27 | ppl 71.46 | epoch 81 | 1200/ 1568 batches | lr 30.00 | ms/batch 303.50 | loss 4.25 | ppl 69.90 | epoch 81 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.82 | loss 4.24 | ppl 69.39 | epoch 81 | 1600/ 1568 batches | lr 30.00 | ms/batch 303.90 | loss 4.21 | ppl 67.39 ----------------------------------------------------------------------------------------- | end of epoch 81 | time: 513.03s | valid loss 4.60 | valid ppl 99.92 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 82 | 200/ 1568 batches | lr 30.00 | ms/batch 300.46 | loss 4.28 | ppl 71.95 | epoch 82 | 400/ 1568 batches | lr 30.00 | ms/batch 301.76 | loss 4.27 | ppl 71.41 | epoch 82 | 600/ 1568 batches | lr 30.00 | ms/batch 302.68 | loss 4.21 | ppl 67.37 | epoch 82 | 800/ 1568 batches | lr 30.00 | ms/batch 302.13 | loss 4.26 | ppl 70.61 | epoch 82 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.93 | loss 4.29 | ppl 73.16 | epoch 82 | 1200/ 1568 batches | lr 30.00 | ms/batch 296.80 | loss 4.25 | ppl 69.91 | epoch 82 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.22 | loss 4.22 | ppl 67.73 | epoch 82 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.25 | loss 4.22 | ppl 67.85 ----------------------------------------------------------------------------------------- | end of epoch 82 | time: 513.63s | valid loss 4.60 | valid ppl 99.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 83 | 200/ 1568 batches | lr 30.00 | ms/batch 305.41 | loss 4.28 | ppl 72.44 | epoch 83 | 400/ 1568 batches | lr 30.00 | ms/batch 297.75 | loss 4.27 | ppl 71.69 | epoch 83 | 600/ 1568 batches | lr 30.00 | ms/batch 299.63 | loss 4.21 | ppl 67.31 | epoch 83 | 800/ 1568 batches | lr 30.00 | ms/batch 301.66 | loss 4.23 | ppl 68.88 | epoch 83 | 1000/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.28 | ppl 72.58 | epoch 83 | 1200/ 1568 batches | lr 30.00 | ms/batch 297.33 | loss 4.23 | ppl 68.37 | epoch 83 | 1400/ 1568 batches | lr 30.00 | ms/batch 298.71 | loss 4.22 | ppl 68.08 | epoch 83 | 1600/ 1568 batches | lr 30.00 | ms/batch 300.56 | loss 4.20 | ppl 66.71 ----------------------------------------------------------------------------------------- | end of epoch 83 | time: 512.86s | valid loss 4.60 | valid ppl 99.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 84 | 200/ 1568 batches | lr 30.00 | ms/batch 300.96 | loss 4.28 | ppl 71.90 | epoch 84 | 400/ 1568 batches | lr 30.00 | ms/batch 296.48 | loss 4.27 | ppl 71.57 | epoch 84 | 600/ 1568 batches | lr 30.00 | ms/batch 289.79 | loss 4.22 | ppl 68.16 | epoch 84 | 800/ 1568 batches | lr 30.00 | ms/batch 302.24 | loss 4.24 | ppl 69.37 | epoch 84 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.49 | loss 4.30 | ppl 73.93 | epoch 84 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.62 | loss 4.22 | ppl 68.35 | epoch 84 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.16 | loss 4.22 | ppl 68.02 | epoch 84 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.58 | loss 4.23 | ppl 68.78 ----------------------------------------------------------------------------------------- | end of epoch 84 | time: 514.03s | valid loss 4.60 | valid ppl 99.91 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 85 | 200/ 1568 batches | lr 30.00 | ms/batch 305.91 | loss 4.28 | ppl 72.54 | epoch 85 | 400/ 1568 batches | lr 30.00 | ms/batch 298.51 | loss 4.26 | ppl 70.81 | epoch 85 | 600/ 1568 batches | lr 30.00 | ms/batch 299.23 | loss 4.22 | ppl 67.94 | epoch 85 | 800/ 1568 batches | lr 30.00 | ms/batch 305.08 | loss 4.25 | ppl 70.12 | epoch 85 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.05 | loss 4.29 | ppl 72.64 | epoch 85 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.71 | loss 4.24 | ppl 69.47 | epoch 85 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.49 | loss 4.23 | ppl 68.88 | epoch 85 | 1600/ 1568 batches | lr 30.00 | ms/batch 304.79 | loss 4.21 | ppl 67.44 ----------------------------------------------------------------------------------------- | end of epoch 85 | time: 513.54s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 86 | 200/ 1568 batches | lr 30.00 | ms/batch 302.03 | loss 4.29 | ppl 72.81 | epoch 86 | 400/ 1568 batches | lr 30.00 | ms/batch 298.58 | loss 4.26 | ppl 70.65 | epoch 86 | 600/ 1568 batches | lr 30.00 | ms/batch 300.19 | loss 4.21 | ppl 67.46 | epoch 86 | 800/ 1568 batches | lr 30.00 | ms/batch 302.00 | loss 4.25 | ppl 69.82 | epoch 86 | 1000/ 1568 batches | lr 30.00 | ms/batch 300.31 | loss 4.29 | ppl 72.95 | epoch 86 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.16 | loss 4.23 | ppl 68.70 | epoch 86 | 1400/ 1568 batches | lr 30.00 | ms/batch 304.17 | loss 4.24 | ppl 69.47 | epoch 86 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.25 | loss 4.21 | ppl 67.25 ----------------------------------------------------------------------------------------- | end of epoch 86 | time: 512.91s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 87 | 200/ 1568 batches | lr 30.00 | ms/batch 298.80 | loss 4.28 | ppl 72.20 | epoch 87 | 400/ 1568 batches | lr 30.00 | ms/batch 297.81 | loss 4.26 | ppl 70.79 | epoch 87 | 600/ 1568 batches | lr 30.00 | ms/batch 297.51 | loss 4.23 | ppl 69.02 | epoch 87 | 800/ 1568 batches | lr 30.00 | ms/batch 301.27 | loss 4.24 | ppl 69.26 | epoch 87 | 1000/ 1568 batches | lr 30.00 | ms/batch 297.51 | loss 4.29 | ppl 72.82 | epoch 87 | 1200/ 1568 batches | lr 30.00 | ms/batch 300.76 | loss 4.24 | ppl 69.10 | epoch 87 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.72 | loss 4.22 | ppl 68.10 | epoch 87 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.27 | loss 4.20 | ppl 66.67 ----------------------------------------------------------------------------------------- | end of epoch 87 | time: 513.70s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 88 | 200/ 1568 batches | lr 30.00 | ms/batch 306.95 | loss 4.28 | ppl 72.02 | epoch 88 | 400/ 1568 batches | lr 30.00 | ms/batch 295.81 | loss 4.25 | ppl 70.23 | epoch 88 | 600/ 1568 batches | lr 30.00 | ms/batch 299.40 | loss 4.21 | ppl 67.06 | epoch 88 | 800/ 1568 batches | lr 30.00 | ms/batch 306.03 | loss 4.24 | ppl 69.46 | epoch 88 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.04 | loss 4.28 | ppl 72.16 | epoch 88 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.16 | loss 4.24 | ppl 69.47 | epoch 88 | 1400/ 1568 batches | lr 30.00 | ms/batch 303.97 | loss 4.23 | ppl 68.68 | epoch 88 | 1600/ 1568 batches | lr 30.00 | ms/batch 298.91 | loss 4.21 | ppl 67.27 ----------------------------------------------------------------------------------------- | end of epoch 88 | time: 513.49s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 89 | 200/ 1568 batches | lr 30.00 | ms/batch 297.42 | loss 4.26 | ppl 71.10 | epoch 89 | 400/ 1568 batches | lr 30.00 | ms/batch 300.14 | loss 4.26 | ppl 70.96 | epoch 89 | 600/ 1568 batches | lr 30.00 | ms/batch 298.01 | loss 4.22 | ppl 67.72 | epoch 89 | 800/ 1568 batches | lr 30.00 | ms/batch 300.68 | loss 4.22 | ppl 68.06 | epoch 89 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.21 | loss 4.29 | ppl 73.22 | epoch 89 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.02 | loss 4.24 | ppl 69.39 | epoch 89 | 1400/ 1568 batches | lr 30.00 | ms/batch 307.29 | loss 4.21 | ppl 67.09 | epoch 89 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.28 | loss 4.20 | ppl 66.66 ----------------------------------------------------------------------------------------- | end of epoch 89 | time: 513.03s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Saving Averaged! | epoch 90 | 200/ 1568 batches | lr 30.00 | ms/batch 301.28 | loss 4.26 | ppl 71.05 | epoch 90 | 400/ 1568 batches | lr 30.00 | ms/batch 298.55 | loss 4.26 | ppl 70.64 | epoch 90 | 600/ 1568 batches | lr 30.00 | ms/batch 298.79 | loss 4.21 | ppl 67.54 | epoch 90 | 800/ 1568 batches | lr 30.00 | ms/batch 300.81 | loss 4.23 | ppl 68.61 | epoch 90 | 1000/ 1568 batches | lr 30.00 | ms/batch 303.49 | loss 4.28 | ppl 72.40 | epoch 90 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.26 | loss 4.24 | ppl 69.58 | epoch 90 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.59 | loss 4.22 | ppl 68.20 | epoch 90 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.12 | loss 4.20 | ppl 66.95 ----------------------------------------------------------------------------------------- | end of epoch 90 | time: 513.27s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- | epoch 91 | 200/ 1568 batches | lr 30.00 | ms/batch 303.62 | loss 4.27 | ppl 71.87 | epoch 91 | 400/ 1568 batches | lr 30.00 | ms/batch 294.00 | loss 4.26 | ppl 70.46 | epoch 91 | 600/ 1568 batches | lr 30.00 | ms/batch 299.43 | loss 4.22 | ppl 67.85 | epoch 91 | 800/ 1568 batches | lr 30.00 | ms/batch 299.18 | loss 4.23 | ppl 68.64 | epoch 91 | 1000/ 1568 batches | lr 30.00 | ms/batch 302.35 | loss 4.29 | ppl 73.20 | epoch 91 | 1200/ 1568 batches | lr 30.00 | ms/batch 302.46 | loss 4.23 | ppl 68.45 | epoch 91 | 1400/ 1568 batches | lr 30.00 | ms/batch 299.93 | loss 4.22 | ppl 67.70 | epoch 91 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.96 | loss 4.22 | ppl 68.29 ----------------------------------------------------------------------------------------- | end of epoch 91 | time: 513.03s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- | epoch 92 | 200/ 1568 batches | lr 30.00 | ms/batch 299.51 | loss 4.29 | ppl 72.78 | epoch 92 | 400/ 1568 batches | lr 30.00 | ms/batch 297.29 | loss 4.26 | ppl 70.66 | epoch 92 | 600/ 1568 batches | lr 30.00 | ms/batch 300.78 | loss 4.22 | ppl 67.78 | epoch 92 | 800/ 1568 batches | lr 30.00 | ms/batch 298.43 | loss 4.23 | ppl 68.74 | epoch 92 | 1000/ 1568 batches | lr 30.00 | ms/batch 301.99 | loss 4.28 | ppl 72.21 | epoch 92 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.50 | loss 4.24 | ppl 69.75 | epoch 92 | 1400/ 1568 batches | lr 30.00 | ms/batch 297.91 | loss 4.22 | ppl 68.07 | epoch 92 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.21 | loss 4.20 | ppl 66.53 ----------------------------------------------------------------------------------------- | end of epoch 92 | time: 511.80s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- | epoch 93 | 200/ 1568 batches | lr 30.00 | ms/batch 306.71 | loss 4.28 | ppl 72.36 | epoch 93 | 400/ 1568 batches | lr 30.00 | ms/batch 297.53 | loss 4.25 | ppl 70.42 | epoch 93 | 600/ 1568 batches | lr 30.00 | ms/batch 300.14 | loss 4.22 | ppl 67.85 | epoch 93 | 800/ 1568 batches | lr 30.00 | ms/batch 301.17 | loss 4.22 | ppl 67.95 | epoch 93 | 1000/ 1568 batches | lr 30.00 | ms/batch 298.70 | loss 4.28 | ppl 72.36 | epoch 93 | 1200/ 1568 batches | lr 30.00 | ms/batch 299.84 | loss 4.25 | ppl 70.34 | epoch 93 | 1400/ 1568 batches | lr 30.00 | ms/batch 302.05 | loss 4.21 | ppl 67.37 | epoch 93 | 1600/ 1568 batches | lr 30.00 | ms/batch 299.13 | loss 4.21 | ppl 67.24 ----------------------------------------------------------------------------------------- | end of epoch 93 | time: 513.10s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- | epoch 94 | 200/ 1568 batches | lr 30.00 | ms/batch 299.72 | loss 4.27 | ppl 71.68 | epoch 94 | 400/ 1568 batches | lr 30.00 | ms/batch 297.28 | loss 4.25 | ppl 70.25 | epoch 94 | 600/ 1568 batches | lr 30.00 | ms/batch 299.73 | loss 4.20 | ppl 66.71 | epoch 94 | 800/ 1568 batches | lr 30.00 | ms/batch 298.39 | loss 4.22 | ppl 67.81 | epoch 94 | 1000/ 1568 batches | lr 30.00 | ms/batch 292.99 | loss 4.28 | ppl 71.92 | epoch 94 | 1200/ 1568 batches | lr 30.00 | ms/batch 298.40 | loss 4.22 | ppl 68.25 | epoch 94 | 1400/ 1568 batches | lr 30.00 | ms/batch 301.74 | loss 4.22 | ppl 68.09 | epoch 94 | 1600/ 1568 batches | lr 30.00 | ms/batch 302.24 | loss 4.23 | ppl 68.39 ----------------------------------------------------------------------------------------- | end of epoch 94 | time: 512.52s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- | epoch 95 | 200/ 1568 batches | lr 30.00 | ms/batch 303.65 | loss 4.26 | ppl 71.11 | epoch 95 | 400/ 1568 batches | lr 30.00 | ms/batch 296.43 | loss 4.27 | ppl 71.33 | epoch 95 | 600/ 1568 batches | lr 30.00 | ms/batch 302.97 | loss 4.21 | ppl 67.44 | epoch 95 | 800/ 1568 batches | lr 30.00 | ms/batch 299.60 | loss 4.23 | ppl 68.88 | epoch 95 | 1000/ 1568 batches | lr 30.00 | ms/batch 305.64 | loss 4.28 | ppl 71.91 | epoch 95 | 1200/ 1568 batches | lr 30.00 | ms/batch 301.70 | loss 4.23 | ppl 68.62 | epoch 95 | 1400/ 1568 batches | lr 30.00 | ms/batch 300.91 | loss 4.22 | ppl 67.91 | epoch 95 | 1600/ 1568 batches | lr 30.00 | ms/batch 301.06 | loss 4.20 | ppl 66.76 ----------------------------------------------------------------------------------------- | end of epoch 95 | time: 512.41s | valid loss 4.60 | valid ppl 99.90 ----------------------------------------------------------------------------------------- Done! (awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python pointer.py --save March-2018_16mb_QRNN_nhid750_batch20_nlayers3_emsize300.pt --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 1000 --data data/March-2018_16mb --model QRNN --epochs 500 --data data/March-2018_16mb --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1500 --nlayers 4 --seed 4002 --model QRNN --wdrop 0.1 --batch_size 20 --emsize=400 --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt usage: pointer.py [-h] [--data DATA] [--model MODEL] [--save SAVE] [--cuda] [--bptt BPTT] [--window WINDOW] [--theta THETA] [--lambdasm LAMBDASM] pointer.py: error: unrecognized arguments: --epochs 500 --clip 0.25 --dropouti 0.4 --dropouth 0.2 --nhid 1500 --nlayers 4 --seed 4002 --wdrop 0.1 --batch_size 20 --emsize=400 (awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python pointer.py --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/March-2018_16mb --model QRNN --epochs 500 --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.ptusage: pointer.py [-h] [--data DATA] [--model MODEL] [--save SAVE] [--cuda] [--bptt BPTT] [--window WINDOW] [--theta THETA] [--lambdasm LAMBDASM] pointer.py: error: unrecognized arguments: --epochs 500 (awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python pointer.py --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/March-2018_16mb --model QRNN --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt RNNModel( (lockdrop): LockedDropout( ) (idrop): Dropout(p=0.4) (hdrop): Dropout(p=0.2) (drop): Dropout(p=0.4) (encoder): Embedding(160308, 400) (rnns): ModuleList( (0): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=800, out_features=4500) ) ) (1): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ) (2): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ) (3): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=1200) ) ) ) (decoder): Linear(in_features=1500, out_features=160308) ) pointer.py:93: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. softmax_output_flat = torch.nn.functional.softmax(output_flat) 2000 407979 1109.8855674949875 pointer.py:101: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. ptr_attn = torch.nn.functional.softmax(theta * logits).view(-1, 1) 4000 407979 751.1170469566459 6000 407979 600.7955793989198 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "pointer.py", line 124, in val_loss = evaluate(val_data, test_batch_size) File "pointer.py", line 77, in evaluate next_word_history = torch.cat([one_hot(t.data[0], ntokens) for t in targets]) if next_word_history is None else torch.cat([next_word_history, torch.cat([one_hot(t.data[0], ntokens) for t in targets])]) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58 (awd-py36) jhave@jhave-Ubuntu:~/Documents/Github/awd-lstm-lm-master$ python pointer.py --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 1000 --data data/March-2018_16mb --model QRNN --save models/March-2018_16mb_QRNN_nhid1500_batch20_nlayers4_emsize400.pt RNNModel( (lockdrop): LockedDropout( ) (idrop): Dropout(p=0.4) (hdrop): Dropout(p=0.2) (drop): Dropout(p=0.4) (encoder): Embedding(160308, 400) (rnns): ModuleList( (0): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=800, out_features=4500) ) ) (1): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ) (2): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=4500) ) ) (3): QRNNLayer( (linear): WeightDrop( (module): Linear(in_features=1500, out_features=1200) ) ) ) (decoder): Linear(in_features=1500, out_features=160308) ) pointer.py:93: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. softmax_output_flat = torch.nn.functional.softmax(output_flat) 1000 407979 1028.6574008357375 2000 407979 1109.8855674949875 3000 407979 782.8378800697892 pointer.py:101: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. ptr_attn = torch.nn.functional.softmax(theta * logits).view(-1, 1) 4000 407979 751.1170469566459 5000 407979 675.8394393324043 6000 407979 600.7818762936812 7000 407979 514.3363528212429 8000 407979 509.76886807527984 9000 407979 527.9799935609335 10000 407979 525.4413183571007 11000 407979 513.6474101118994 12000 407979 468.926449052666 13000 407979 472.3487610561114 14000 407979 461.5233451336913 15000 407979 440.05524383375433 16000 407979 444.6987586109729 17000 407979 444.61286585457054 18000 407979 435.7451752144919 19000 407979 427.6885572352174 20000 407979 425.49468499244296 21000 407979 422.6822943423199 22000 407979 417.5152134097812 23000 407979 420.81884211676953 24000 407979 428.7017735785528 25000 407979 431.29654796552444 26000 407979 443.245790723414 27000 407979 454.0416179706655 28000 407979 470.5047059689454 29000 407979 458.3143473867959 30000 407979 449.76870915969454 31000 407979 457.28105850610916 32000 407979 448.20550742026796 33000 407979 444.02341946256524 34000 407979 445.04842985783085 35000 407979 441.11234952016446 36000 407979 443.07925983944256 37000 407979 442.86575719125517 38000 407979 443.656930733399 39000 407979 442.73670318799196 40000 407979 439.373565335941 41000 407979 431.39668191684865 42000 407979 426.65055077444015 43000 407979 444.8210918800004 44000 407979 445.17816602516785 45000 407979 438.79047756084907 46000 407979 436.3519848042288 47000 407979 439.5602647423701 48000 407979 440.02004917548805 49000 407979 440.3577441220625 50000 407979 442.9285707666926 51000 407979 441.66957426970976 52000 407979 438.8940162016029 53000 407979 444.07328831790215 54000 407979 446.1055880300406 55000 407979 444.922961273292 56000 407979 448.0290108462899 57000 407979 443.53228400307285 58000 407979 433.798037689121 59000 407979 429.0291469166777 60000 407979 425.30446574906364 61000 407979 424.23192473487944 62000 407979 425.71338188384374 63000 407979 427.6051486358974 64000 407979 430.5632430625364 65000 407979 432.66057537316436 66000 407979 432.81946466895494 67000 407979 435.3114898101322 68000 407979 435.88043869049824 69000 407979 440.63364323850374 70000 407979 441.43405347676344 71000 407979 443.6602693300051 72000 407979 444.236163481181 73000 407979 439.0762034989461 74000 407979 440.7918735270048 75000 407979 445.41993024504444 76000 407979 450.4158379364601 77000 407979 456.155172997314 78000 407979 457.45755253590437 79000 407979 459.3912411947543 80000 407979 463.1734729687934 81000 407979 463.2219249247582 82000 407979 463.4006067195675 83000 407979 466.82592544662674 84000 407979 469.9382804551609 85000 407979 471.8236832836256 86000 407979 473.9030502667837 87000 407979 475.5567969653097 88000 407979 473.2509288072543 89000 407979 475.36726989990495 90000 407979 477.16703975930807 91000 407979 479.8188697640783 92000 407979 478.77467386006094 93000 407979 484.9910811552948 94000 407979 483.1443258970524 95000 407979 481.88863164781986 96000 407979 485.8521321087937 97000 407979 482.40964801556294 98000 407979 483.5627769125748 99000 407979 487.4521640883732 100000 407979 489.5915384545111 101000 407979 494.4268952103675 102000 407979 497.1319252477158 103000 407979 497.0691453345401 104000 407979 495.42801248322644 105000 407979 495.3341604602157 106000 407979 497.6364411202307 107000 407979 492.0077164318757 108000 407979 493.1178560856576 109000 407979 491.5969031480069 110000 407979 493.8137768942895 111000 407979 492.1890265696371 112000 407979 492.705851979313 113000 407979 491.43381355068215 114000 407979 486.7968355679367 115000 407979 486.08155952582194 116000 407979 485.62538281394893 117000 407979 485.581879690658 118000 407979 485.7811254593517 119000 407979 486.7181060370632 120000 407979 483.9291500985168 121000 407979 477.88622085814734 122000 407979 477.2724956358356 123000 407979 479.3320324609323 124000 407979 480.4806007984131 125000 407979 480.7070724432998 126000 407979 481.5449395822369 127000 407979 481.4903050775703 128000 407979 482.09553766684434 129000 407979 482.11962735013253 130000 407979 483.1804911923166 131000 407979 481.30410287847326 132000 407979 479.40730558179246 133000 407979 480.1502055824448 134000 407979 480.5890851304339 135000 407979 479.7543811893013 136000 407979 477.98679173012215 137000 407979 472.0190069184201 138000 407979 468.4406577520939 139000 407979 467.9651988380303 140000 407979 465.80497987582214 141000 407979 467.67725269003677 142000 407979 468.81563973594547 143000 407979 467.1557100037877 144000 407979 466.96661576167253 145000 407979 467.6102357389057 146000 407979 467.213002012793 147000 407979 467.26850185546806 148000 407979 466.69351999578123 149000 407979 467.2280698450127 150000 407979 471.365708327505 151000 407979 473.20794843442854 152000 407979 473.69624158350285 153000 407979 473.8533182295189 154000 407979 476.7603064140188 155000 407979 476.4774614286919 156000 407979 475.74488112826106 157000 407979 473.426716926651 158000 407979 471.01444210357164 159000 407979 471.07587971152566 160000 407979 472.4114613737927 161000 407979 472.6632840725371 162000 407979 472.6227579260125 163000 407979 471.3709692282742 164000 407979 470.64446164947884 165000 407979 471.6962470515647 166000 407979 473.61089864445967 167000 407979 469.56421986754265 168000 407979 470.9916381410587 169000 407979 473.51219125671463 170000 407979 474.13712156806383 171000 407979 473.7543995582381 172000 407979 474.0140245625647 173000 407979 473.32985361085946 174000 407979 473.95022822371936 175000 407979 473.37668283347807 176000 407979 474.81976616755634 177000 407979 474.67310384328005 178000 407979 475.72111328226174 179000 407979 475.5797280259783 180000 407979 476.07914006708864 181000 407979 478.9313058978597 182000 407979 478.42317536292234 183000 407979 477.1234078775403 184000 407979 476.78561755219596 185000 407979 483.78194355860734 186000 407979 484.10495873413345 187000 407979 483.24781057783014 188000 407979 483.2421333104883 189000 407979 481.8101054294817 190000 407979 481.7661343499296 191000 407979 481.7183430280171 192000 407979 481.2226051345749 193000 407979 480.6853332355457 194000 407979 480.3176995420485 195000 407979 480.44807815972104 196000 407979 478.83579735343784 197000 407979 479.1119530386364 198000 407979 477.51421963926316 199000 407979 476.05075025115985 200000 407979 475.93428562268167 201000 407979 475.3230965380323 202000 407979 476.57485257555913 203000 407979 476.7378194648435 204000 407979 476.2010627240477 205000 407979 477.27507623803933 206000 407979 478.30571101376864 207000 407979 477.66040296764044 208000 407979 475.49833862088553 209000 407979 476.10626704048826 210000 407979 475.94966345872626 211000 407979 476.1305403947513 212000 407979 476.8127872134246 213000 407979 476.56423451545777 214000 407979 476.7566867449324 215000 407979 477.8214865621595 216000 407979 477.94200958308954 217000 407979 478.9608821369799 218000 407979 477.82467011291624 219000 407979 477.1920629452952 220000 407979 477.1300500974579 221000 407979 475.85466999965405 222000 407979 475.48162318489943 223000 407979 476.7519011632038 224000 407979 475.21438008034687 225000 407979 474.8229523559287 226000 407979 472.5759836643728 227000 407979 472.6180257593078 228000 407979 474.5986068570789 229000 407979 475.04111348020774 230000 407979 477.5263120614302 231000 407979 478.3020581992768 232000 407979 476.93679309859493 233000 407979 471.85079123132215 234000 407979 466.44332262617144 235000 407979 460.8529790516079 236000 407979 455.2254584998325 237000 407979 450.00566652958395 238000 407979 444.39119289925037 239000 407979 440.26830153968524 240000 407979 435.404301545569 241000 407979 430.58638829053143 242000 407979 427.0097383787177 243000 407979 423.39882220483406 244000 407979 419.7668932336134 245000 407979 417.202388720352 246000 407979 415.16818194274106 247000 407979 411.1719483564902 248000 407979 406.25763983700983 249000 407979 401.435746584984 250000 407979 396.30743108200505 251000 407979 391.4446374388238 252000 407979 384.9602985339503 253000 407979 378.94068704268915 254000 407979 372.9823081158486 255000 407979 367.4762354060336 256000 407979 362.40824582598674 257000 407979 357.41722866185324 258000 407979 352.9963753793859 259000 407979 347.44292585322546 260000 407979 341.72883418791645 261000 407979 336.5695425656947 262000 407979 331.8318833548202 263000 407979 327.1933404618222 264000 407979 322.88001287803314 265000 407979 318.31984700405076 266000 407979 313.51424778382466 267000 407979 309.21078920913595 268000 407979 304.78458163672127 269000 407979 300.5695599321621 270000 407979 296.40548819927807 271000 407979 292.5430211166561 272000 407979 289.92138157313315 273000 407979 287.38068362899253 274000 407979 284.60474275201705 275000 407979 281.7846296437301 276000 407979 279.3438014700543 277000 407979 275.0284622453173 278000 407979 270.85578589797575 279000 407979 266.61071943831115 280000 407979 263.2617869408049 281000 407979 260.17913862701647 282000 407979 258.2567748973083 283000 407979 255.95162538593573 284000 407979 253.55498687557716 285000 407979 251.11171529893682 286000 407979 248.6309913052103 287000 407979 245.98329755226936 288000 407979 243.17277456487895 289000 407979 240.9212407325017 290000 407979 238.3867136684828 291000 407979 234.74331833632024 292000 407979 230.90768447299152 293000 407979 227.33356720399144 294000 407979 224.3175200442951 295000 407979 222.57217013657157 296000 407979 220.8895572398521 297000 407979 218.72985798271836 298000 407979 215.95562581941786 299000 407979 214.22258897320748 300000 407979 211.51622954902555 301000 407979 208.79250281571444 302000 407979 206.93277317637916 303000 407979 205.43461699303487 304000 407979 203.8665170066292 305000 407979 202.15555705278018 306000 407979 200.23181704946074 307000 407979 198.22278057997354 308000 407979 196.43748917156046 309000 407979 194.8051544176797 310000 407979 193.0668790137287 311000 407979 190.33646588923048 312000 407979 187.58408741824863 313000 407979 184.96821964498073 314000 407979 183.10209778523787 315000 407979 181.68229747654203 316000 407979 180.5450411512074 317000 407979 179.11810767720465 318000 407979 177.83708131822866 319000 407979 176.80467084534493 320000 407979 175.6654942488912 321000 407979 174.42046244019582 322000 407979 173.13966077734912 323000 407979 172.1834073437297 324000 407979 171.20285584010517 325000 407979 170.16048540212114 326000 407979 168.6466365229815 327000 407979 167.56609106621676 328000 407979 166.38834492930926 329000 407979 165.37442731655048 330000 407979 164.55106387724072 331000 407979 163.85297414747737 332000 407979 163.0038593585234 333000 407979 162.0867702377986 334000 407979 161.34258369482274 335000 407979 160.47295930373846 336000 407979 159.7577545037546 337000 407979 159.03249186672844 338000 407979 158.2629260363988 339000 407979 157.28369856951954 340000 407979 156.1146205030185 341000 407979 154.97440416837708 342000 407979 153.9120021904323 343000 407979 152.7781510238908 344000 407979 151.88258200288902 345000 407979 150.94854001015742 346000 407979 149.97719812253422 347000 407979 148.50281530470215 348000 407979 146.6421307943444 349000 407979 145.01265124894195 350000 407979 143.31557394310826 351000 407979 141.98734348723738 352000 407979 140.50674998757668 353000 407979 139.22874902728776 354000 407979 138.30410740082655 355000 407979 137.02338872658646 356000 407979 135.85669571896457 357000 407979 134.7055982819092 358000 407979 133.90978732471575 359000 407979 132.57073940433824 360000 407979 131.239581254117 361000 407979 129.83691041014117 362000 407979 128.36316005024335 363000 407979 126.81724299758174 364000 407979 125.59011345833679 365000 407979 124.65456483493487 366000 407979 123.91730578849808 367000 407979 123.18602758503636 368000 407979 122.4623689681373 369000 407979 121.75705693044196 370000 407979 120.99379944648038 371000 407979 120.25038710889447 372000 407979 118.9847994192118 373000 407979 118.09402086555887 374000 407979 117.22948305925254 375000 407979 116.27699079793659 376000 407979 115.23039098892271 377000 407979 114.44561051024925 378000 407979 113.44636945775393 379000 407979 112.67785504824309 380000 407979 111.60711599474463 381000 407979 110.55155850435521 382000 407979 109.59301053557306 383000 407979 108.91839720684955 384000 407979 108.4116222760671 385000 407979 107.69950824999182 386000 407979 107.1641164221052 387000 407979 106.72180008034556 388000 407979 106.21923367600206 389000 407979 105.77945681489508 390000 407979 105.06798237458598 391000 407979 104.48459676293454 392000 407979 103.94037237666743 393000 407979 103.4478001591831 394000 407979 102.81055596821764 395000 407979 102.03062960353157 396000 407979 101.25304630076188 397000 407979 100.46772737920095 398000 407979 99.68573339632539 399000 407979 99.01426350934578 400000 407979 98.25354570038122 401000 407979 97.8273020153582 402000 407979 97.57267126342245 403000 407979 97.20577694912319 404000 407979 96.87676150748695 405000 407979 96.40974340238431 406000 407979 96.06404947122412 407000 407979 95.81879880702914 ========================================================================================= | End of pointer | val loss 4.56 | val ppl 95.69 ========================================================================================= 1000 667489 346.4848837870787 2000 667489 370.10544977391004 3000 667489 430.12490529695833 THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "pointer.py", line 131, in test_loss = evaluate(test_data, test_batch_size) File "pointer.py", line 102, in evaluate ptr_dist = (ptr_attn.expand_as(valid_next_word) * valid_next_word).sum(0).squeeze() RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58