{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing and Comparing EMLO Collections\n", "\n", "The EMLO project contains dozens of correspondence collections centered around different historical figures. Each collection is maintained either by a single institute, or is a merging of smaller collections maintained across multiple institutions. \n", "\n", "The metadata of the correspondences has been mapped to a single schema.\n", "\n", "Making a comparison of different sets of correspondences, at different scales, draws the focus on different aspects of comparison. At the same time, it brings to the surface some differences in how the digital collections were shaped by selection criteria.\n", "\n", "At a small scale, it is easy to see for instance that a collection around a historical figure, e.g. Samuel Hartlib or Françoise de Graffigny has not only letters authored by or addressed to that figure, but also some letters between the correspondents in their networks. When working with many correspondence collections with thousands or tens of thousands of letters, this is a detail that is easily lost in overviews of metadata records and most summary statistics.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import glob\n", "import matplotlib.pyplot as plt\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first load the data into a dataframe and inspect a number of rows so we get an idea of what is in there." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0idtypecollectiondateauthoraddresseeorigindestinationrepository
00577c226e-adfa-43ed-8db2-95792b73f3c3LetterBayle, Pierre21 February 1662Bayle, Jacob, 1644-1685Bayle, Jean, 1609-1685Puylaurens, Occitanie, FranceCarla-Bayle, Occitanie, FranceCollection d'E. Labrousse\\n \\n \\n ...
110ccd18c5-de76-4c66-a7b5-fbc344a9dcdeLetterBayle, Pierre17 June 1662Bayle, Jacob, 1644-1685Bayle, Jean, 1609-1685Puylaurens, Occitanie, FranceCarla-Bayle, Occitanie, France2 printed editions
222e189892-8105-42e9-876c-02aaa97fb877LetterBayle, Pierre25 August 1662Bayle, Jacob, 1644-1685Bayle, Jean, 1609-1685Puylaurens, Occitanie, FranceCarla-Bayle, Occitanie, FranceCollection d'E. Labrousse\\n \\n \\n ...
33da35bb30-cfe1-4b4a-957d-42742088e693LetterBayle, Pierre5 March 1663Bayle, Jacob, 1644-1685Bayle, Pierre, 1647-1706Puylaurens, Occitanie, FranceCarla-Bayle, Occitanie, France1 printed edition
44c5fefa45-f197-4bc3-bf52-1272f7f1d573LetterBayle, Pierre7 April 1665Bayle, Jacob, 1644-1685Bayle, Jean, 1609-1685Puylaurens, Occitanie, FranceCarla-Bayle, Occitanie, FranceBibliothèque de la Société de l'Histoire du P...
.................................
13220523a9088a3f-3eb2-4039-aeaa-8ad98b9e6f35LetterBeeckman, Isaac12 April 1632Croix, Jacques, 1579-1655Beeckman, Isaac, 1588-1637Delft, South Holland, (United Provinces) Nethe...Dordrecht, South Holland, (United Provinces) N...NaN
132206241759be67-e3d3-4e30-a987-b0f84b212f03LetterBeeckman, Isaac17 May 1632Beeckman, Isaac, 1588-1637Rivet, André, 1572-1651Dordrecht, South Holland, (United Provinces) N...The Hague, South Holland, NetherlandsNaN
13220725c1c3e71d-9ca1-4953-968a-b5a8e5d3c00bLetterBeeckman, Isaac30 May 1633Beeckman, Isaac, 1588-1637Mersenne, Marin, 1588-1648Dordrecht, South Holland, (United Provinces) N...Paris, Île-de-France, FranceNaN
132208261b82d6d1-8b0c-4bc2-aa57-c856c73c2e66LetterBeeckman, Isaac22 August 1634Descartes, René, 1596-1650Beeckman, Isaac, 1588-1637Amsterdam, North Holland, (United Provinces) N...Dordrecht, South Holland, (United Provinces) N...NaN
1322092706dc5c13-cec1-4fcd-972e-0d6052350645LetterBeeckman, Isaac13 February 1635Beeckman, Isaac, 1588-1637Beeckman, Abraham, fl. 1635Dordrecht, South Holland, (United Provinces) N...Amsterdam, North Holland, (United Provinces) N...NaN
\n", "

132210 rows × 10 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 id type \\\n", "0 0 577c226e-adfa-43ed-8db2-95792b73f3c3 Letter \n", "1 1 0ccd18c5-de76-4c66-a7b5-fbc344a9dcde Letter \n", "2 2 2e189892-8105-42e9-876c-02aaa97fb877 Letter \n", "3 3 da35bb30-cfe1-4b4a-957d-42742088e693 Letter \n", "4 4 c5fefa45-f197-4bc3-bf52-1272f7f1d573 Letter \n", "... ... ... ... \n", "132205 23 a9088a3f-3eb2-4039-aeaa-8ad98b9e6f35 Letter \n", "132206 24 1759be67-e3d3-4e30-a987-b0f84b212f03 Letter \n", "132207 25 c1c3e71d-9ca1-4953-968a-b5a8e5d3c00b Letter \n", "132208 26 1b82d6d1-8b0c-4bc2-aa57-c856c73c2e66 Letter \n", "132209 27 06dc5c13-cec1-4fcd-972e-0d6052350645 Letter \n", "\n", " collection date author \\\n", "0 Bayle, Pierre 21 February 1662 Bayle, Jacob, 1644-1685 \n", "1 Bayle, Pierre 17 June 1662 Bayle, Jacob, 1644-1685 \n", "2 Bayle, Pierre 25 August 1662 Bayle, Jacob, 1644-1685 \n", "3 Bayle, Pierre 5 March 1663 Bayle, Jacob, 1644-1685 \n", "4 Bayle, Pierre 7 April 1665 Bayle, Jacob, 1644-1685 \n", "... ... ... ... \n", "132205 Beeckman, Isaac 12 April 1632 Croix, Jacques, 1579-1655 \n", "132206 Beeckman, Isaac 17 May 1632 Beeckman, Isaac, 1588-1637 \n", "132207 Beeckman, Isaac 30 May 1633 Beeckman, Isaac, 1588-1637 \n", "132208 Beeckman, Isaac 22 August 1634 Descartes, René, 1596-1650 \n", "132209 Beeckman, Isaac 13 February 1635 Beeckman, Isaac, 1588-1637 \n", "\n", " addressee \\\n", "0 Bayle, Jean, 1609-1685 \n", "1 Bayle, Jean, 1609-1685 \n", "2 Bayle, Jean, 1609-1685 \n", "3 Bayle, Pierre, 1647-1706 \n", "4 Bayle, Jean, 1609-1685 \n", "... ... \n", "132205 Beeckman, Isaac, 1588-1637 \n", "132206 Rivet, André, 1572-1651 \n", "132207 Mersenne, Marin, 1588-1648 \n", "132208 Beeckman, Isaac, 1588-1637 \n", "132209 Beeckman, Abraham, fl. 1635 \n", "\n", " origin \\\n", "0 Puylaurens, Occitanie, France \n", "1 Puylaurens, Occitanie, France \n", "2 Puylaurens, Occitanie, France \n", "3 Puylaurens, Occitanie, France \n", "4 Puylaurens, Occitanie, France \n", "... ... \n", "132205 Delft, South Holland, (United Provinces) Nethe... \n", "132206 Dordrecht, South Holland, (United Provinces) N... \n", "132207 Dordrecht, South Holland, (United Provinces) N... \n", "132208 Amsterdam, North Holland, (United Provinces) N... \n", "132209 Dordrecht, South Holland, (United Provinces) N... \n", "\n", " destination \\\n", "0 Carla-Bayle, Occitanie, France \n", "1 Carla-Bayle, Occitanie, France \n", "2 Carla-Bayle, Occitanie, France \n", "3 Carla-Bayle, Occitanie, France \n", "4 Carla-Bayle, Occitanie, France \n", "... ... \n", "132205 Dordrecht, South Holland, (United Provinces) N... \n", "132206 The Hague, South Holland, Netherlands \n", "132207 Paris, Île-de-France, France \n", "132208 Dordrecht, South Holland, (United Provinces) N... \n", "132209 Amsterdam, North Holland, (United Provinces) N... \n", "\n", " repository \n", "0 Collection d'E. Labrousse\\n \\n \\n ... \n", "1 2 printed editions \n", "2 Collection d'E. Labrousse\\n \\n \\n ... \n", "3 1 printed edition \n", "4 Bibliothèque de la Société de l'Histoire du P... \n", "... ... \n", "132205 NaN \n", "132206 NaN \n", "132207 NaN \n", "132208 NaN \n", "132209 NaN \n", "\n", "[132210 rows x 10 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# read the merged letters file into a Pandas dataframe\n", "merged_letters_file = '../data/emlo_letters.csv'\n", "df = pd.read_csv(merged_letters_file, sep='\\t')\n", "\n", "df" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of distinct authors: 14619\n", "number of distinct addressees: 7649\n" ] } ], "source": [ "# show the nnnumber of authors and addressees\n", "print('number of distinct authors:', df['author'].nunique())\n", "print('number of distinct addressees:', df['addressee'].nunique())\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Bodleian card catalogue 48668\n", "Groot, Hugo de 8034\n", "Huygens, Constantijn 7120\n", "Hartlib, Samuel 4719\n", "Andreae, Johann Valentin 3696\n", " ... \n", "Beeckman, Isaac 28\n", "Dudley, Anne 27\n", "Vernon, Margaret 21\n", "Baxter, Richard 8\n", "Culpeper, Cheney 3\n", "Name: collection, Length: 93, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The correspondence collections in the dataset with the number of letters they contain.\n", "df.collection.value_counts()\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Groot, Hugo de, 1583-1645 4912\n", "Huygens, Constantijn, 1596-1687 3951\n", "Plantin, Christophe, 1520-1589 2331\n", "Vossius, Gerardus Joannes, 1577-1649 2292\n", "Peiresc, Nicolas-Claude Fabri de, 1580-1637 2111\n", " ... \n", "Farley (Mr), fl. 1775; Rose (Mr), fl. 1775 1\n", "Newton, James, fl. 1710-1713 1\n", "Martinus, William, fl. 1621 1\n", "Council of State of the Republic of the United Netherlands 1\n", "Montagu, Edward, 1562-1644 1\n", "Name: author, Length: 14619, dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The authors in the dataset with the number of letters they sent.\n", "df.author.value_counts()\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Huygens, Constantijn, 1596-1687 4737\n", "Hearne, Thomas, 1678-1735 3795\n", "Vossius, Gerardus Joannes, 1577-1649 3498\n", "Hartlib, Samuel, 1600-1662 3388\n", "Groot, Hugo de, 1583-1645 3233\n", " ... \n", "Blount, Edward, fl. 1724 1\n", "Baert, Pieter J., fl. 1676-1691 1\n", "Werndeley, (Reverend Mr), fl. 1711; Werndeley, sons of, fl. 1711 1\n", "Vossius, Gerardus, 1619-1640 1\n", "Heidelberg, ministers in, fl. 1655 1\n", "Name: addressee, Length: 7649, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The addressee in the dataset with the number of letters they received.\n", "df.addressee.value_counts()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of distinct authors: 14619\n", "number of distinct addressees: 7649\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Adjust the default size for figures so that placing two plots \n", "# next to each other in a sub plot are still big enough.\n", "plt.rcParams['figure.figsize'] = [15, 5]\n", "\n", "# create a plot canvas with two adjacent subplots\n", "plt.subplot(1,2,1)\n", "# Distribution of number of letters per author\n", "# Sub-plot 1 shows the number of letters by each letter author on normal scaled axes\n", "df['author'].value_counts().hist(bins=100)\n", "plt.ylabel('Number of authors')\n", "plt.xlabel('Number of letters authored')\n", "\n", "plt.subplot(1,2,2)\n", "# Sub-plot 1 shows the number of letters by each letter author on a log scaled y-axis\n", "df['author'].value_counts().hist(bins=100)\n", "plt.ylabel('Number of authors')\n", "plt.xlabel('Number of letters authored')\n", "plt.yscale('log')\n", "\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.subplot(1,2,1)\n", "# Number of letters by each letter addressee\n", "df['addressee'].value_counts().hist(bins=100)\n", "plt.xlabel('Number of letters received')\n", "plt.ylabel('Number of addressees')\n", "\n", "\n", "# Distribution of number of letters per addressee\n", "plt.subplot(1,2,2)\n", "df['addressee'].value_counts().hist(bins=100)\n", "plt.ylabel('Number of addressees')\n", "plt.xlabel('Number of letters received')\n", "plt.yscale('log')\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.subplot(1,2,1)\n", "# Number of letters by each letter addressee\n", "df['author'].value_counts().hist(bins=100)\n", "plt.ylabel('Number of authors')\n", "plt.xlabel('Number of letters authored')\n", "plt.yscale('log')\n", "\n", "\n", "# Distribution of number of letters per addressee\n", "plt.subplot(1,2,2)\n", "df['addressee'].value_counts().hist(bins=100)\n", "plt.ylabel('Number of addressees')\n", "plt.xlabel('Number of letters received')\n", "plt.yscale('log')\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from collections import Counter\n", "\n", "author_dist = Counter([count for count in df['author'].value_counts()])\n", "x_author, y_author = zip(*author_dist.items())\n", "plt.subplot(1,2,1)\n", "plt.scatter(x_author, y_author)\n", "plt.xscale('log')\n", "plt.yscale('log')\n", "plt.xlabel('Number of letters authored')\n", "plt.ylabel('Number of authors')\n", "\n", "plt.subplot(1,2,2)\n", "addressee_dist = Counter([count for count in df['addressee'].value_counts()])\n", "x_addressee, y_addressee = zip(*addressee_dist.items())\n", "plt.scatter(x_addressee, y_addressee)\n", "plt.xscale('log')\n", "plt.yscale('log')\n", "plt.xlabel('Number of letters received')\n", "plt.ylabel('Number of addressees')\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plots show typical skewed distributions. The vast majority of correspondents author and/or receive only one or a few letters (the hight bar on the left of each figure represents all authors/addressees authoring or receiving only one letter). Only a handful of people author or receive more than a thousand letters. \n", "\n", "Who are the most prolific authors?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Groot, Hugo de, 1583-1645 4912\n", "Huygens, Constantijn, 1596-1687 3951\n", "Plantin, Christophe, 1520-1589 2331\n", "Vossius, Gerardus Joannes, 1577-1649 2292\n", "Peiresc, Nicolas-Claude Fabri de, 1580-1637 2111\n", "Oldenburg, Henry, 1619-1677 2092\n", "Scaliger, Joseph Justus, 1540-1609 1847\n", "Huygens, Christiaan, 1629-1695 1423\n", "Hearne, Thomas, 1678-1735 1382\n", "Wallis, John (Dr), 1616-1703 1290\n", "Smith, Thomas (Dr), 1638-1710 1114\n", "Dury, John, 1596-1680 1003\n", "Bayle, Pierre, 1647-1706 938\n", "Bourignon, Antoinette, 1616-1680 887\n", "Montagu, Mary Wortley (Lady), 1689-1762 860\n", "Aubrey, John, 1626-1697 854\n", "Heinsius, Nicolaas, 1620-1681 833\n", "Descartes, René, 1596-1650 796\n", "Vossius, Isaac (Dr), 1618-1689 793\n", "Brett, Thomas, 1667-1744 742\n", "Name: author, dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['author'].value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the list above, most of the authors are the central figure or eponym of one of the EMLO collections.\n", "\n", "Exceptions are:\n", "\n", "- John Dury (1596-1650): Preacher and ecumenist\n", "- August II of Braunschweig-Wolfenbüttel (1579-1666): Duke (Herzog) of Braunschweig-Wolfenbüttel\n", "\n", "These are prolific authors in collections centred on someone else. \n", "\n", "We first look at the letters of August II. Which collections are they part of?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collections with August II of Braunschweig-Wolfenbüttel's letters:\n" ] }, { "data": { "text/plain": [ "Andreae, Johann Valentin 582\n", "Kircher, Athanasius 21\n", "Bodleian card catalogue 1\n", "Braunschweig-Wolfenbüttel, Sophia Hedwig von 1\n", "Name: collection, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Collections with August II of Braunschweig-Wolfenbüttel's letters:\")\n", "df[df['author'] == 'August II of Braunschweig-Wolfenbüttel, 1579-1666']['collection'].value_counts()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we look at who these letters are addressed to:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Addressees of August II of Braunschweig-Wolfenbüttel's letters:\n" ] }, { "data": { "text/plain": [ "Andreae, Johann Valentin, 1586-1654 581\n", "Kircher, Athanasius, 1601-1680 21\n", "Württemberg, Eberhard III von, 1614-1674 1\n", "Braunschweig-Lüneburg, Georg, 1582-1641 1\n", "Braunschweig-Wolfenbüttel, Sophia Hedwig von, 1592-1642 1\n", "Name: addressee, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Addressees of August II of Braunschweig-Wolfenbüttel's letters:\")\n", "df[df['author'] == 'August II of Braunschweig-Wolfenbüttel, 1579-1666']['addressee'].value_counts()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These two queries reveal a typical pattern in these collections. August II has 582 letters in the collection of Johann Valentin Andreae, of which 581 are also addressed to Andreae. Letters in a collection around a certain person tend be either authored or addressed to this person, which makes sense from a recordkeeping perspective. But there is one letter addressed to someone else, i.e. Eberhard III von Württemberg.\n", "\n", "Now, let us look at the same queries for John Dury's letters:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collections with John Dury's letters:\n" ] }, { "data": { "text/plain": [ "Hartlib, Samuel 837\n", "Bodleian card catalogue 149\n", "Ussher, James 6\n", "Mede, Joseph 3\n", "Huygens, Constantijn 2\n", "Culpeper, Cheney 2\n", "Boyle, Robert 2\n", "Vossius, Gerardus Joannes 1\n", "Bisterfeld, Johann Heinrich 1\n", "Name: collection, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Collections with John Dury's letters:\")\n", "df[df['author'] == 'Dury, John, 1596-1680']['collection'].value_counts()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "John Dury has letters in eight different collections, but in seven of those, it is only a handful of letters. We can also see who he addressed those letters to:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Addressees of John Dury's letters:\n" ] }, { "data": { "text/plain": [ "Hartlib, Samuel, 1600-1662 528\n", "Roe, Thomas (Sir), 1581-1644 31\n", "Culpeper, Cheney, 1601-1663 11\n", "Borthwick, Eleazar, fl. 1633-1642 7\n", "St Amand, Joseph, fl. 1636-1643 7\n", " ... \n", "House of Commons (1641-1712) 1\n", "St Gallen and Appenzell, Clergy in, fl. 1654 1\n", "Cecil, Elizabeth, fl. 1640 1\n", "Coysh, Joseph, fl. 1652 1\n", "Rusdorf, Johann Joachim von, 1589-1640 1\n", "Name: addressee, Length: 135, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Addressees of John Dury's letters:\")\n", "df[df['author'] == 'Dury, John, 1596-1680']['addressee'].value_counts()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we see a differennt pattern. Samuel Hartlib is by far the most frequent addressee of John Dury's letters in these collections. But looking at the two sets of counts above, we note that John Dury authored 837 letters in the Samuel Hartlib collections, of which only 528 are addressed to Samuel Hartlib. Who are the other 309 letters in the Samuel Hartlib collection addressed to? " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Addressees of John Dury's letters in the Samuel Hartlib:\n" ] }, { "data": { "text/plain": [ "Hartlib, Samuel, 1600-1662 528\n", "Roe, Thomas (Sir), 1581-1644 30\n", "Culpeper, Cheney, 1601-1663 9\n", "Waller, William (Sir), 1598-1668 7\n", "Borthwick, Eleazar, fl. 1633-1642 7\n", " ... \n", "Ames, William, 1576-1633 1\n", "Ancelin, fl. 1660 1\n", "Bedell, William, 1572-1642 1\n", "Figulus, Petr, 1619-1670 1\n", "Palmer, Herbert, 1601-1647 1\n", "Name: addressee, Length: 127, dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Addressees of John Dury's letters in the Samuel Hartlib:\")\n", "df[(df['author'] == 'Dury, John, 1596-1680') & (df['collection'] == 'Hartlib, Samuel')]['addressee'].value_counts()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apparently, some collections also contains hundreds of letters that are not authored by or addressed to the collection eponym.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyzing the Addressees" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Huygens, Constantijn, 1596-1687 4737\n", "Hearne, Thomas, 1678-1735 3795\n", "Vossius, Gerardus Joannes, 1577-1649 3498\n", "Hartlib, Samuel, 1600-1662 3388\n", "Groot, Hugo de, 1583-1645 3233\n", "Lhwyd, Edward, 1659-1709 3226\n", "Charlett, Arthur (Reverend), 1655-1722 3066\n", "Andreae, Johann Valentin, 1586-1654 2953\n", "Noble, Mark (Reverend), 1754-1827 2809\n", "Sancroft, William, 1617-1693 2797\n", "Kircher, Athanasius, 1601-1680 2209\n", "Oldenburg, Henry, 1619-1677 2127\n", "D'Orville, Jacques Philippe, 1696-1751 2066\n", "Brett, Thomas, 1667-1744 1767\n", "Lister, Martin, 1639-1712 1704\n", "Vossius, Isaac (Dr), 1618-1689 1690\n", "Solms-Braunfels, Amalia von, 1602-1675 1660\n", "Smith, Thomas (Dr), 1638-1710 1637\n", "Wood, Anthony, 1632-1695 1547\n", "Scaliger, Joseph Justus, 1540-1609 1512\n", "Name: addressee, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['addressee'].value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the list above, most of the addressees are the central figure or eponym of one of the EMLO collections.\n", "\n", "Exceptions are:\n", "\n", "- Nicolaas Reigersberch (1584-1654): brother-in-law of Hugo de Groot; Jurist\n", "- Willem de Groot (1597-1662): brother of Hugo de Groot (1583-1645); Dutch jurist\n", "\n", "These are prolific authors in collections centred on someone else. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collections with letters to Nicolaas Reigersberch:\n" ] }, { "data": { "text/plain": [ "Groot, Hugo de 881\n", "Vossius, Gerardus Joannes 8\n", "Bodleian card catalogue 6\n", "Name: collection, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Collections with letters to Nicolaas Reigersberch:\")\n", "df[df['addressee'] == 'Reigersberch, Nicolaas, 1584-1654']['collection'].value_counts()\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Authors of letters to Nicolaas Reigersberch:\n" ] }, { "data": { "text/plain": [ "Groot, Hugo de, 1583-1645 862\n", "Reigersberch, Maria, 1589-1653 18\n", "Vossius, Gerardus Joannes, 1577-1649 14\n", "Groot, Willem de, 1597-1662 1\n", "Name: author, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Authors of letters to Nicolaas Reigersberch:\")\n", "df[df['addressee'] == 'Reigersberch, Nicolaas, 1584-1654']['author'].value_counts()\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collections with letters to Willem de Groot:\n" ] }, { "data": { "text/plain": [ "Groot, Hugo de 732\n", "Vossius, Gerardus Joannes 3\n", "Bodleian card catalogue 1\n", "Name: collection, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Collections with letters to Willem de Groot:\")\n", "df[df['addressee'] == 'Groot, Willem de, 1597-1662']['collection'].value_counts()\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Authors of letters to Willem de Groot:\n" ] }, { "data": { "text/plain": [ "Groot, Hugo de, 1583-1645 726\n", "Vossius, Gerardus Joannes, 1577-1649 4\n", "Groot, Johan Hugo de, 1554-1640 4\n", "Groot van Kraayenburg, Dirck de, 1618-1661 2\n", "Name: author, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(\"Authors of letters to Willem de Groot:\")\n", "df[df['addressee'] == 'Groot, Willem de, 1597-1662']['author'].value_counts()\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we see some letters between persons who are not the central figure in any of the EMLO collections. \n", "\n", "How many letters in each collection do not involve the eponym as either author or addressee?\n", "\n", "First, we map the name of the collection to the name as used as author or addressee:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bodleian card catalogue\n" ] } ], "source": [ "eponyms = list(df['collection'].unique())\n", "authors = list(df['author'].unique())\n", "author_counts = df['author'].value_counts()\n", "authors\n", "\n", "best_map = {}\n", "eponym_map = {}\n", "for eponym in eponyms:\n", " #print(eponym)\n", " for author in authors:\n", " if not isinstance(author, str) or ';' in author:\n", " continue\n", " if eponym == 'Fermat, Pierre de' and author == 'Fermat, Pierre, 1601-1665':\n", " eponym_map[eponym] = author\n", " if eponym == 'Comenius, Jan Amos' and author == 'Komenský, Jan Amos, 1592-1670':\n", " eponym_map[eponym] = author\n", " if eponym in author[:len(eponym)]:\n", " if eponym not in best_map or author_counts[author] > best_map[eponym]:\n", " best_map[eponym] = author_counts[author]\n", " eponym_map[eponym] = author\n", " if eponym not in eponym_map:\n", " print(eponym)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collection:\t\t\t\t\t\tAll letters\tNon-eponym letters\n", "----------------------------------------------------------------------------------------\n", "Bayle, Pierre \t1791\t\t133\t(0.07)\n", "Sirleto, Guglielmo \t1438\t\t15\t(0.01)\n", "Seidenbecher, Georg Lorenz \t47\t\t0\t(0.00)\n", "Swammerdam, Jan \t172\t\t4\t(0.02)\n", "Fermat, Pierre de \t121\t\t6\t(0.05)\n", "Ortelius, Abraham \t467\t\t0\t(0.00)\n", "Reneri, Henricus \t61\t\t0\t(0.00)\n", "Spinoza, Baruch \t58\t\t1\t(0.02)\n", "Lister, Martin \t1212\t\t2\t(0.00)\n", "Wallis, John \t1998\t\t232\t(0.12)\n", "Ussher, James \t681\t\t17\t(0.02)\n", "Groot, Hugo de \t8034\t\t280\t(0.03)\n", "Franckenberg, Abraham von \t85\t\t1\t(0.01)\n", "Bourignon, Antoinette \t940\t\t0\t(0.00)\n", "Peiresc, Nicolas-Claude Fabri de \t1939\t\t11\t(0.01)\n", "Hartlib, Samuel \t4719\t\t1221\t(0.26)\n", "Braunschweig-Wolfenbüttel, Sophia Hedwig von \t169\t\t1\t(0.01)\n", "Solms-Braunfels, Amalia von \t1184\t\t3\t(0.00)\n", "Comenius, Jan Amos \t571\t\t41\t(0.07)\n", "Selden, John \t355\t\t10\t(0.03)\n", "Culpeper, Cheney \t3\t\t0\t(0.00)\n", "Montagu, Mary Wortley \t963\t\t0\t(0.00)\n", "Nierop, Dirck Rembrantsz van \t80\t\t4\t(0.05)\n", "Plantin, Christophe \t3030\t\t324\t(0.11)\n", "Oldenburg, Henry \t3176\t\t34\t(0.01)\n", "Pontanus, Johannes Isacius \t321\t\t2\t(0.01)\n", "Dodington, John \t571\t\t8\t(0.01)\n", "Stuart, Arbella \t118\t\t2\t(0.02)\n", "Vives, Juan Luis \t195\t\t19\t(0.10)\n", "Jungius, Joachim \t506\t\t18\t(0.04)\n", "Coccejus, Johannes \t515\t\t2\t(0.00)\n", "Opitz, Martin \t110\t\t0\t(0.00)\n", "Agustín, Antonio \t579\t\t5\t(0.01)\n", "Andreae, Johann Valentin \t3696\t\t81\t(0.02)\n", "Anhalt-Dessau, Henriette Amalia von \t1352\t\t1\t(0.00)\n", "Thomson, Richard \t78\t\t2\t(0.03)\n", "Schott, Caspar \t180\t\t0\t(0.00)\n", "Permeier, Johann \t89\t\t3\t(0.03)\n", "Vossius, Isaac \t1703\t\t2\t(0.00)\n", "Pascal, Blaise \t49\t\t6\t(0.12)\n", "Magini, Giovanni Antonio \t100\t\t1\t(0.01)\n", "Vossius, Gerardus Joannes \t3430\t\t14\t(0.00)\n", "Bernegger, Matthias \t435\t\t0\t(0.00)\n", "Scaliger, Joseph Justus \t3338\t\t8\t(0.00)\n", "Mengoli, Pietro \t40\t\t0\t(0.00)\n", "Plot, Robert \t108\t\t6\t(0.06)\n", "Hilchen, David \t98\t\t22\t(0.22)\n", "Sidney, Philip \t380\t\t5\t(0.01)\n", "Boyle, Robert \t1759\t\t9\t(0.01)\n", "Jurin, James \t701\t\t0\t(0.00)\n", "Sachs von Löwenheim, Philipp Jakob \t143\t\t0\t(0.00)\n", "Kepler, Johannes \t883\t\t57\t(0.06)\n", "Beverland, Hadriaan \t305\t\t0\t(0.00)\n", "Reland, Adriaan \t211\t\t0\t(0.00)\n", "Bisterfeld, Johann Heinrich \t121\t\t3\t(0.02)\n", "Baxter, Richard \t8\t\t0\t(0.00)\n", "Lhwyd, Edward \t2128\t\t74\t(0.03)\n", "Euler, Leonhard \t811\t\t35\t(0.04)\n", "Halley, Edmond \t245\t\t0\t(0.00)\n", "Aubrey, John \t1073\t\t10\t(0.01)\n", "Conway, Anne \t296\t\t78\t(0.26)\n", "Mersenne, Marin \t1904\t\t784\t(0.41)\n", "Milton, John \t66\t\t0\t(0.00)\n", "Rabus, Pieter \t30\t\t0\t(0.00)\n", "Mede, Joseph \t441\t\t9\t(0.02)\n", "Pennant, Thomas \t508\t\t1\t(0.00)\n", "Hobbes, Thomas \t223\t\t10\t(0.04)\n", "Beale, Robert \t101\t\t27\t(0.27)\n", "Huygens, Christiaan \t3080\t\t393\t(0.13)\n", "Claude, Jean \t117\t\t4\t(0.03)\n", "Ruar, Martin \t100\t\t17\t(0.17)\n", "Gray, Thomas \t651\t\t7\t(0.01)\n", "Oranje-Nassau, Albertine Agnes van \t782\t\t48\t(0.06)\n", "Schurman, Anna Maria van \t244\t\t1\t(0.00)\n", "Rich, Penelope \t42\t\t2\t(0.05)\n", "Newton, Isaac \t1140\t\t135\t(0.12)\n", "Clifford, Margaret \t131\t\t0\t(0.00)\n", "Kircher, Athanasius \t2693\t\t3\t(0.00)\n", "Dudley, Anne \t27\t\t1\t(0.04)\n", "Ashmole, Elias \t764\t\t27\t(0.04)\n", "Vernon, Francis \t275\t\t0\t(0.00)\n", "Collins, John \t273\t\t88\t(0.32)\n", "Descartes, René \t727\t\t7\t(0.01)\n", "Huygens, Constantijn \t7120\t\t6\t(0.00)\n", "Bacon, Anne \t197\t\t2\t(0.01)\n", "Gruter, Jan \t136\t\t0\t(0.00)\n", "Worthington, John \t174\t\t8\t(0.05)\n", "Brahe, Tycho \t505\t\t28\t(0.06)\n", "Rubens, Peter Paul \t940\t\t558\t(0.59)\n", "Hutton, Charles \t133\t\t6\t(0.05)\n", "Vernon, Margaret \t21\t\t0\t(0.00)\n", "Beeckman, Isaac \t28\t\t1\t(0.04)\n" ] } ], "source": [ "print(\"Collection:\\t\\t\\t\\t\\t\\tAll letters\\tNon-eponym letters\")\n", "print(\"----------------------------------------------------------------------------------------\")\n", "for eponym in eponym_map:\n", " epo_df = df[df['collection'] == eponym]\n", " #print(eponym, '\\t', eponym_map[eponym])\n", " non_epo_df = df[(df['collection'] == eponym) & (df['author'] != eponym_map[eponym]) & (df['addressee'] != eponym_map[eponym])]\n", " perc = non_epo_df.shape[0] / epo_df.shape[0]\n", " print(f\"{eponym: <50}\\t{epo_df.shape[0]}\\t\\t{non_epo_df.shape[0]}\\t({perc:.2f})\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most collection have almost exclusively letters involving the eponym, but some collections are very different. In the Peter Paul Rubens collection, the majority (59%) of letters are between other people than Rubens. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
collectionauthoraddressee
131088Rubens, Peter PaulMoretus, Balthasar, 1574-1641Rubens, Philip, 1574-1611
131089Rubens, Peter PaulRubens, Philip, 1574-1611Rubens, Peter Paul, 1577-1640
131090Rubens, Peter PaulAlbert VII, Archduke of Austria, 1559-1621Richardot, Jean, 1570-1614
131091Rubens, Peter PaulGonzaga, Vincenzo I, 1562-1612Damasceni Peretti, Alessandro, 1571-1623
131092Rubens, Peter PaulDamasceni Peretti, Alessandro, 1571-1623Gonzaga, Vincenzo I, 1562-1612
131093Rubens, Peter PaulArrigoni, Lelio, b.1541Chieppio, Annibal, 1563-1623
131094Rubens, Peter PaulRubens, Philip, 1574-1611Rubens, Peter Paul, 1577-1640
131095Rubens, Peter PaulArrigoni, Lelio, b.1541Chieppio, Annibal, 1563-1623
131096Rubens, Peter PaulRichardot, Jean, 1570-1614Gonzaga, Vincenzo I, 1562-1612
131097Rubens, Peter PaulArrigoni, Lelio, b.1541Chieppio, Annibal, 1563-1623
\n", "
" ], "text/plain": [ " collection author \\\n", "131088 Rubens, Peter Paul Moretus, Balthasar, 1574-1641 \n", "131089 Rubens, Peter Paul Rubens, Philip, 1574-1611 \n", "131090 Rubens, Peter Paul Albert VII, Archduke of Austria, 1559-1621 \n", "131091 Rubens, Peter Paul Gonzaga, Vincenzo I, 1562-1612 \n", "131092 Rubens, Peter Paul Damasceni Peretti, Alessandro, 1571-1623 \n", "131093 Rubens, Peter Paul Arrigoni, Lelio, b.1541 \n", "131094 Rubens, Peter Paul Rubens, Philip, 1574-1611 \n", "131095 Rubens, Peter Paul Arrigoni, Lelio, b.1541 \n", "131096 Rubens, Peter Paul Richardot, Jean, 1570-1614 \n", "131097 Rubens, Peter Paul Arrigoni, Lelio, b.1541 \n", "\n", " addressee \n", "131088 Rubens, Philip, 1574-1611 \n", "131089 Rubens, Peter Paul, 1577-1640 \n", "131090 Richardot, Jean, 1570-1614 \n", "131091 Damasceni Peretti, Alessandro, 1571-1623 \n", "131092 Gonzaga, Vincenzo I, 1562-1612 \n", "131093 Chieppio, Annibal, 1563-1623 \n", "131094 Rubens, Peter Paul, 1577-1640 \n", "131095 Chieppio, Annibal, 1563-1623 \n", "131096 Gonzaga, Vincenzo I, 1562-1612 \n", "131097 Chieppio, Annibal, 1563-1623 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['collection'] == 'Rubens, Peter Paul'][['collection','author','addressee']].head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Normalization and Classification for Creating and Comparing Groups\n", "\n", "The metadata is fairly minimal when considering just the fields that are in the dataset. But there are more things that can be done.\n", "\n", "- The names of senders and recipients have the birth and death years (in most cases), so we could use these to group persons by age at death, or birth decade. \n", "\n", "- The dates that the letters were sent are often exact down to the specific day, but sometimes only a month was known or an earliest and latest probable dates. We can normalise those dates to get an insight in when letters were sent, in which year or month. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normalizing and scale\n", "\n", "At a small scale, there is no need to normalize data, as the researcher can do that mentally while working with the materials.\n", "\n", "At an intermediate scale of hundreds or thousands of documents, the variations in names of persons and places, ways in which dates are recorded are becoming a hurdle to analysis. For topical analysis, this is also an issue, as many connections between documents are hard to bring to the surface because of morphological and spelling variations. \n", "\n", "At a large scale with hundreds of thousands or millions of documents, the textual variations become less of a hurdle, as there is enough data to identify and map variants. \n", "\n", "At a very large scale with tens or hundreds of millions of documents, the textual variations become meaningful and allow measuring contextual nuance in how word variants are used to convey different aspects." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9 January 1638 14\n", "Unknown date 14\n", "12 December 1643 12\n", "20 June 1643 11\n", "6 February 1644 11\n", " ..\n", "4 April 1634 1\n", "3 November 1628 1\n", "9 June 1628 1\n", "20 August 1627 1\n", "11 January 1609 1\n", "Name: date, Length: 4067, dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.collection == 'Groot, Hugo de'].date.value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The are 4067 different values for the dates, with the most common date being 9 January 1638. There are also 14 unknown dates. " ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "import re\n", "\n", "def is_day_month_year(sent_date):\n", " return re.match(r'^\\d+ (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\w* \\d{4}$', sent_date) != None\n", "\n", "def is_month_year(sent_date):\n", " return re.match(r'^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\w* \\d{4}$', sent_date) != None\n", "\n", "def is_year(sent_date):\n", " return re.match(r'^\\d{4}$', sent_date) != None\n", "\n", "def is_day_month(sent_date):\n", " return re.match(r'^\\d+ (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\w*$', sent_date) != None\n", "\n", "def is_month(sent_date):\n", " return re.match(r'^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\w*$', sent_date) != None\n", "\n", "def is_day_year(sent_date):\n", " return re.match(r'^\\d+ \\d{4}$', sent_date) != None\n", "\n", "def get_year(sent_date):\n", " if is_day_month_year(sent_date) or is_year(sent_date) or is_day_year(sent_date) or is_month_year(sent_date):\n", " return int(sent_date[-4:])\n", " else:\n", " return None\n", " \n", "def get_month(sent_date):\n", " if is_day_month_year(sent_date) or is_month(sent_date) or is_day_month(sent_date) or is_month_year(sent_date):\n", " match = re.match(r'.*((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\\w*).*', sent_date)\n", " return match.group(1)\n", " else:\n", " return None\n", " \n", " \n", "def get_date_type(sent_date):\n", " if is_day_month_year(sent_date):\n", " return 'day_month_year'\n", " if is_month_year(sent_date):\n", " return 'month_year'\n", " if is_year(sent_date):\n", " return 'year'\n", " if is_day_month(sent_date):\n", " return 'day_month'\n", " if is_month(sent_date):\n", " return 'day_month'\n", " if is_day_year(sent_date):\n", " return 'day_year'\n", " if 'Between' in sent_date:\n", " return 'range_between'\n", " if 'On or before' in sent_date:\n", " return 'range_before'\n", " if 'On or after' in sent_date:\n", " return 'range_after'\n", " if 'Unknown date' in sent_date:\n", " return 'unknown'\n", " else:\n", " return 'invalid format'\n", "\n", "#df['date_type'] = df.date.apply(get_date_type)\n", "df['date_year'] = df.date.apply(get_year)\n", "df['date_month'] = df.date.apply(get_month)\n", "\n" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "day_month_year 114636\n", "unknown 5748\n", "month_year 3778\n", "year 3103\n", "day_month 3053\n", "range_between 1700\n", "range_before 104\n", "range_after 44\n", "day_year 37\n", "invalid format 7\n", "Name: date_type, dtype: int64" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.date_type.value_counts()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "322.0" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.date_year.max() - df.date_year.min() + 1" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df.date_year.hist(bins=322)#.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "March 11031\n", "August 10507\n", "July 10364\n", "May 10335\n", "April 10239\n", "January 10139\n", "September 9957\n", "June 9951\n", "October 9884\n", "February 9855\n", "November 9640\n", "December 9565\n", "Name: date_month, dtype: int64" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.date_month.value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "author addressee \n", "Albert VII, Archduke of Austria, 1559-1621 Gonzaga, Vincenzo I, 1562-1612 1\n", "Rubens, Peter Paul, 1577-1640 Vosbergen, Josias van, 1593-1628 1\n", "Mello, Francisco Manuel de, 1608-1666 Chambre des Comptes (Spanish Netherlands) 1\n", "Mennes, John (Sir), 1599-1671 Admiralty Court, England 1\n", "Moncada, Francisco, 1586-1635 Philip IV, King of Spain, 1605-1665 1\n", " ..\n", "Rubens, Peter Paul, 1577-1640 Fabri, Palamède, 1582-1645 18\n", " Olivares, Gaspar de Guzmán, Conde de, 1587-1645 25\n", "Ferdinand, archiduc d'Autriche, 1609-1641 Philip IV, King of Spain, 1605-1665 36\n", "Rubens, Peter Paul, 1577-1640 Dupuy, Pierre, 1582-1651 71\n", "Peiresc, Nicolas-Claude Fabri de, 1580-1637 Rubens, Peter Paul, 1577-1640 85\n", "Length: 301, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df_rubens = df[df.collection == 'Rubens, Peter Paul']\n", "\n", "df_rubens[['collection','author','addressee']].head(10)\n", "\n", "g = df_rubens.groupby(['author', 'addressee']).size()\n", "\n", "u = g.unstack('author')\n", "\n", "plt.imshow(u, cmap='hot', interpolation='nearest')\n", "\n", "g.sort_values()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connections between collections\n", "\n", "How many connections are there between collections? This is easy with two collections, but becomes more difficult when there are many collections.\n", "\n", "Which persons appear in multiple collections?" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Unknown 37\n", "Oldenburg, Henry, 1619-1677 16\n", "Huygens, Constantijn, 1596-1687 14\n", "Mersenne, Marin, 1588-1648 13\n", "Gronovius, Johann Frederick, 1611-1671 12\n", "Groot, Hugo de, 1583-1645 11\n", "Leibniz, Gottfried Wilhelm, 1646-1716 11\n", "Vossius, Gerardus Joannes, 1577-1649 11\n", "Descartes, René, 1596-1650 10\n", "Huygens, Christiaan, 1629-1695 10\n", "Rivet, André, 1572-1651 10\n", "Saumaise, Claude de, 1588-1653 10\n", "Hevelius, Johannes, 1611-1687 9\n", "Boyle, Robert, 1627-1691 9\n", "Hartlib, Samuel, 1600-1662 9\n", "Aubrey, John, 1626-1697 9\n", "Digby, Kenelm (Sir), 1603-1665 9\n", "Gassendi, Pierre, 1592-1655 9\n", "Wallis, John (Dr), 1616-1703 9\n", "Dury, John, 1596-1680 8\n", "Lipsius, Justus, 1547-1606 8\n", "Komenský, Jan Amos, 1592-1670 8\n", "Bernegger, Matthias, 1582-1640 8\n", "Newton, Isaac (Sir), 1642-1727 8\n", "Peiresc, Nicolas-Claude Fabri de, 1580-1637 8\n", "Kircher, Athanasius, 1601-1680 8\n", "Boulliau, Ismaël, 1605-1694 8\n", "Heinsius, Daniel, 1580-1655 8\n", "Sorbière, Samuel, 1615-1670 8\n", "Christina, Queen of Sweden, 1626-1689 7\n", " ..\n", "Manwaring, Robert, fl. 1637 1\n", "Smith, John, 1630-1679 1\n", "Cort, Christiaan de, 1611-1669 1\n", "Aerssen, Cornelis van, 1600-1662 1\n", "Cottereau, N., 1641-1706 1\n", "Conway, Anne, 1631-1679 1\n", "Chieppio, Annibal, 1563-1623 1\n", "Honywood, Robert (Sir), 1601-1686 1\n", "Doublet, Philips, d.1647 1\n", "Beeck, Anna, fl. 1649 1\n", "Montagu, Edward Wortley, 1678-1761 1\n", "Martinengo da Barco, Ascanio, 1539-1600 1\n", "Ghilde, Johan Flud van, fl. 1685 1\n", "Thomas, Robert, b.1681 1\n", "Brassard, Marie, fl. 1685 1\n", "Dalrymple, James, 1619-1695 1\n", "Leslie, John (Sir), 1766-1832 1\n", "Bacon, Arthur, fl. 1652 1\n", "Standfast, William (Reverend), 1683-1754 1\n", "Hoorn, magistrate of, fl. 1616-1617 1\n", "Clenche, Andrew, d.1692 1\n", "Bachcroft, Thomas, 1571-1662; Bainbridge, Thomas, 1574-1646; Brownrigg, Ralph (Dr), 1592-1659; Collins, Samuel (Dr), 1576-1651; Love, Richard, 1596-1661 1\n", "Higgins, Obadiah, 1663-1741 1\n", "Techmannus, Arnoldus, 1594-1666 1\n", "Klage, Thomas, 1598-1664 1\n", "Bodecher, J. W., fl. 1643 1\n", "Bachacius, Martinus, 1539-1612 1\n", "Morgan, Anthony, fl. 1654 1\n", "Finch, John, fl. 1653 1\n", "Ellenmeier, Johann, fl. 1635 1\n", "Name: author, dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# which authors occur in multiple collections\n", "df[(df[['collection', 'author']].duplicated(keep='first') == False)]['author'].value_counts()\n" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Oldenburg, Henry 1524\n", "Wallis, John 142\n", "Boyle, Robert 107\n", "Huygens, Christiaan 105\n", "Lister, Martin 61\n", "Hartlib, Samuel 46\n", "Newton, Isaac 16\n", "Swammerdam, Jan 8\n", "Milton, John 7\n", "Sachs von Löwenheim, Philipp Jakob 5\n", "Vossius, Isaac 4\n", "Hobbes, Thomas 2\n", "Ashmole, Elias 1\n", "Comenius, Jan Amos 1\n", "Vossius, Gerardus Joannes 1\n", "Coccejus, Johannes 1\n", "Name: collection, dtype: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['author'] == 'Oldenburg, Henry, 1619-1677']['collection'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Samuel Hartlib is in the top 20 of addressees but not in the top 20 of authors:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Samuel Hartlib\n", "\n", "\tnumber of letters sent: 401\n", "\tnumber of letters received: 3388\n" ] } ], "source": [ "print('Samuel Hartlib\\n')\n", "print(f'\\tnumber of letters sent:', df[df['author'] == 'Hartlib, Samuel, 1600-1662'].shape[0])\n", "print(f'\\tnumber of letters received:', (df[df['addressee'] == 'Hartlib, Samuel, 1600-1662'].shape[0]))\n" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd0XNW1+PHvVpesZllyQZItd2OMq7BNjGvokBgC5EfC\njxBCnoFA8hKS/ELyeO+RBF56KCt5BAMBA6GXQIAEMNjYxlXuXd2SjG31bkkzmvP7Y+6Ika0yVWW0\nP2tpeebMufcenWVtHZ177j5ijEEppVToCuvvBiillAouDfRKKRXiNNArpVSI00CvlFIhTgO9UkqF\nOA30SikV4jTQK6VUiNNAr5RSIU4DvVJKhbiI/m4AQGpqqsnKyurvZiil1KCyc+fOSmNMWm/1BkSg\nz8rKIicnp7+boZRSg4qIHPOknk7dKKVUiNNAr5RSIU4DvVJKhTgN9EopFeI00CulVIjTQK+UUiFO\nA71SSoU4DfRKKdWPapraeHN3GcHc1tXjQC8i4SKyW0Tesd6PF5FtIpIvIi+LSJRVHm29z7c+zwpO\n05VSavB7aUcpP3h5L4dO1AftGt6M6P8dOOz2/jfAQ8aYSUANcJtVfhtQY5U/ZNVTSinVhbzyBgDW\nH60I2jU8CvQikgFcBTxpvRdgBfCaVWUNcI31eqX1HuvzL1r1lVJKnaGgvBGA9UfLg3YNT0f0DwP/\nD3BY70cAtcYYu/W+DEi3XqcDpQDW53VWfaWUUm6MMRRUNBERJuwqqaWu2RaU6/Qa6EXkaqDcGLMz\nkBcWkVUikiMiORUVwfuTRSmlBqqT9S00ttq5euYY2h2GjfnBiYWejOgXAV8WkWLgJZxTNo8AySLi\nyn6ZARy3Xh8HMgGsz5OAqjNPaoxZbYzJNsZkp6X1mmVTKaVCTr41bXP9vEyS4yKDNk/fa6A3xvzU\nGJNhjMkCbgQ+NsbcBKwDrreq3QK8Zb1+23qP9fnHJpjrhpRSapByBfopo+NZPDmN9UcrcDgCHy79\nWUf/E+AeEcnHOQf/lFX+FDDCKr8HuNe/JiqlVGjKL28kMSaCtPholk9No7KxNSjLLL3aeMQYsx5Y\nb70uBOZ3UacFuCEAbVNKqZCWX97IpJHxiAhLpjinsNcdKWdGelJAr6NPxiqlVD8pqHAGeoDU+Ghm\nZSSxPjfw8/Qa6JVSqh/UNrdR2djWEegBlk4dye6SGmqb2wJ6LQ30SinVD1w3Yt0D/fKpaTgMbMir\nDOi1NNArpVQ/KKiwAn1aQkfZzIxkhsdFsv5IYJ+S1UCvlFL9IL+8keiIMNKHx3aUhYcJS6ek8Ulu\nYJdZaqBXSql+kF/eyIS0eMLDOqcCWzZ1JFVNbew/Xhewa2mgV0qpfpBf0cjEtGFnlS+ZkoZIYLNZ\naqBXSqk+1mJrp6zmdKcbsS4pw6KYlZHMugBms9RAr5RSfaygohFj6DLQAyyfOpK9ZbVUNwVmmaUG\neqWU6mNdLa10t2xqGsbAhgA9PKWBXiml+lhBeSNhAuNTz56jBzg/PYkRw6ICthmJBnqllOpj+RWN\njE2JIzoivMvPw9yWWbYHYJmlBnqllOpjrmRmPVk2bSQ1zTb2ldX6fT0N9Eop1Yfs7Q6KKpuY2Eug\nXzI5lTCBdQFYZqmBXiml+lBJdTO2dsOktJ4DfXJcFHPGDueTAMzTa6BXSqk+1NuKG3ezMpIpqGjy\n+5qebA4eIyLbRWSviBwUkZ9b5c+ISJGI7LG+ZlvlIiKPiki+iOwTkbl+t1IppUJEvpXMrLepG4Ck\n2EgaW+3Y2x1+XdOTHaZagRXGmEYRiQQ2icg/rc9+bIx57Yz6VwCTra8FwGPWv0opNeTllzcyKjGa\nxJjIXusmxTpDdH2LnZRhUT5f05PNwY0xptF6G2l99bTeZyXwrHXcViBZRMb43EKllAohBRVNHk3b\nACTFOX8Z1J+2+XVNj+boRSRcRPYA5cCHxpht1kcPWtMzD4lItFWWDpS6HV5mlZ15zlUikiMiORUV\ngd86SymlBhpjDAXljUzs5Uasi2vUX9cXgd4Y026MmQ1kAPNFZAbwU2AacAGQAvzEmwsbY1YbY7KN\nMdlpaWleNlsppQafU/WtNLbaPR/Rx/ZhoHcxxtQC64DLjTEnrOmZVuBpYL5V7TiQ6XZYhlWmlFJD\nWseKGw9H9K5AX98S5EAvImkikmy9jgUuAY645t1FRIBrgAPWIW8D37BW3ywE6owxJ/xqpVJKhYD8\n8gbAs6WVAIkBGtF7supmDLBGRMJx/mJ4xRjzjoh8LCJpgAB7gDus+u8BVwL5QDNwq18tVEqpEJFf\n0UhCTARpCdG9VyZwUze9BnpjzD5gThflK7qpb4C7/GqVUkqFIFeOG+dESO9iIsOJigij/rTdr+vq\nk7FKqUHPOb4c+PLLmzyen3dJjIns25uxSik10Bw4Xse8B9by5u6y/m5Kj+qabVQ2tno8P++SFBvR\nN+volVJqoPrrp0VUN7Xxo1f38cHBk/3dnG7lV3h3I9YlKTYy+KtulFJqoKprtvHuvhNcOyedGeck\ncvcLu/k0v7K/m9Ulb5KZuUuM1akbpdQQ9vquMlrtDr69eDzP3DqfrNQ4/u3ZHHaX1PR3086SX95I\nVEQYGcPjvDouSQO9UmqoMsbwwvYSZmUmc945SQwfFsVzty0gNT6abz69gyMn6/u7iZ3klzcyIXUY\n4WGerbhxSYqN1Dl6pdTQtKO4hvzyRm6aP7ajbFRiDH/79gJiIsO4+antFFf6n8s9UPIret8+sCuJ\nMZHUt9j9WlmkgV4pNSi9sO0YCdERXD2rc3LczJQ4nr9tAfZ2Bzc9uY0Tdac9Pmdtcxtv7/2MH726\nl+e2FAesrS22dspqTnuczMxdUmwk7Q5DY6vva+k9eTJWKaUGlJqmNt47cJIbL8gkLursMDZ5VAJr\nvjWfrz+xjesf28KiSSPISh1G1gjn17gRcQyLjsDhMOw/Xsf6oxWszy1nb2ktDgNhAu/uO8HKOeke\n5Y3vTX55I8Z4fyMWOj8dm+BjWzTQK6UGndd3ldFmd/D1BWO7rTMzI5lnbr2A375/lI+PVFDZ2Hmd\n/ciEaNodhqqmNkRgZnoSd6+YzLKpaYSLsPLPn/LGzjK+uWi83+19aUcJkeHC/PEpXh+b6Np85LQd\nhvt2fQ30SqlBxXUTdu7YZKaNTuyxbnZWCq/cfiEAja12iiubOFbVTHFVE8WVTTgMLJ6cyuLJqYyI\n75x/ZnZmMs9tPcYtX8jyOGVBV07Vt/DKjjKun5fJqMQYr48PRGIzDfRKqUFlW1E1hRVN/O76mV4d\nFx8dwYz0JGakJ3lU/+aF4/jhq3vZUljFFyam+tJUAFZvKKTdGO5cOtGn4wOR2ExvxiqlBpUXtpWQ\nEBPB1TPPCep1rpo5huS4SJ7fesznc1Q1tvK3bcdYOfscxo7wbv28i+segT9Px2qgV0oNGtVNbfzr\nwEmum5tBbFR4UK8VExnO/8nO5P2DpzhV3+LTOZ7aVESr3cF3lk3yuR2B2DdWA71SatB4bWcpbe09\n34QNpK8vGIvDGF7cXuL1sXXNNp7dcoyrzh/j02obl/ioCMIkyFM3IhIjIttFZK+IHBSRn1vl40Vk\nm4jki8jLIhJllUdb7/Otz7N8bp1SSlmMMby4vZTsccOZMiqhT645bsQwlk5J48XtJdjaHV4d+/Tm\nIhpb7dy13PfRPEBYmJAQ49/TsZ6M6FuBFcaYWcBs4HJri8DfAA8ZYyYBNcBtVv3bgBqr/CGrnlJK\n+WVLQRVFlU19Npp3uXnhOE7Vt7L20CmPj2losfH0p8VcMn0U547peWWQJ/zNd9NroLc2AG+03kZa\nXwZYAbxmla/BuW8swErrPdbnXxR/1iYppRTwwvYSkmIjufL8Mb1XDqBlU0eSnhzLc17clH1+awl1\np23c7edo3iXogR5ARMJFZA9QDnwIFAC1xhjXM7llQLr1Oh0oBbA+rwNG+NxCpdSQ19xm5/2DJ/nK\n3HRiIoN7E/ZM4WHCTQvHsrmgqmNz756cbmvnyY2FLJmSxqzM5IC0ITE2gvoW31MgeBTojTHtxpjZ\nQAYwH5jm8xUtIrJKRHJEJKeiosLf0ymlQlhVYxu2dsP0AEyD+OKr2ZlEhYfx/Nbeb8q+uL2EqqY2\nvrciMKN56KMRvYsxphZYB1wIJIuI64GrDOC49fo4kAlgfZ4EVHVxrtXGmGxjTHZaWpqPzVdKDQUN\n1mg2IaZ/nvFMjY/myvNH8/rOMprbuh9Zt9rbeXxDAQsnpJCd5X26g+4EPdCLSJqIJFuvY4FLgMM4\nA/71VrVbgLes129b77E+/9gMlp17lVIDkitzo69JvQLh5gvH0dBq5609n3Vb59WcMk7Vt/LdFZMD\neu1EP1fdePLrcQywRkTCcf5ieMUY846IHAJeEpEHgN3AU1b9p4DnRCQfqAZu9Ll1SimFcxULONMY\n9Je5Y4czbXQCz205xo0XZHbkv2lus3PgeD17Smt4alMRc8Ym84WJgb0tmRgbSavdQYut3ad7FL32\nmjFmHzCni/JCnPP1Z5a3ADd43RKllOpGf0/dAIgIN184jv948wCPfJTHqfoW9pTWkXuqgXaHc9Ji\n3Ig47rtqul9J0LriyndTf9oWnECvlFL9rWEATN0AXDM7nd/88wgPr80jMSaCWZnJXHLuRGZlJjMr\nM5nUMzJgBoorg2V9i42RPmTA1ECvlBrwXFM3/TmiBxgWHcE7312M3eFgfOqwgI/cu+NvBksN9Eqp\nAa+hxU5kuBAd0f/puXzNQukPfwN9//eaUkr1orHFTkJMZJ+NoAeaxBi3XaZ8oIFeKTXgNbTY+nXF\nTX/TEb1SKuQ1tNj7fX6+P/m7naAGeqXUgNfQOrQDfWR4GHFR4T4/NKWBXik14DW02ImP7t+llf3N\nnzQIGuiVUgNeQ4ut44bkUKWBXikV0hpb7cQP8UCfGBPp8wbhGuiVUgOaMWbI34wF5w3ZOl1eqZQK\nRadt7bQ7TL+nP+hvSbG+Z7DUQK+UGtAarYRmQ3kdPVi7TGmgV0qFovoBkLlyIEiKjaSh1d6RKdMb\nGuiVUgOaa9ORRJ26AfBpVO/JDlOZIrJORA6JyEER+Xer/H4ROS4ie6yvK92O+amI5IvIURG5zOtW\nKaWUpWPTER3RA749HetJz9mBHxpjdolIArBTRD60PnvIGPN798oiMh3nrlLnAecAa0VkijGm3evW\nKaWGvIGw6chA4PqLxpcllr2O6I0xJ4wxu6zXDTj3i03v4ZCVwEvGmFZjTBGQTxc7USmllCcaWwbG\npiP9LSnO9xG9V3P0IpKFc1vBbVbR3SKyT0T+KiLDrbJ0oNTtsDJ6/sWglFLdqh8A+8UOBP5M3Xgc\n6EUkHngd+L4xph54DJgIzAZOAH/w5sIiskpEckQkp6KiwptDlVJDSIMurwTcpm58eGjKo0AvIpE4\ng/zfjDFvABhjThlj2o0xDuAJPp+eOQ5kuh2eYZV1YoxZbYzJNsZkp6Wled1wpdTQ0NhqZ1hUOOFh\nQ3PTEZegjujFuaXLU8BhY8wf3crHuFW7FjhgvX4buFFEokVkPDAZ2O51y5RSCueqm6E+Pw8QExlG\nVHhY0FbdLAJuBvaLyB6r7GfA10RkNmCAYuB2AGPMQRF5BTiEc8XOXbriRinlK81z4yQizqdjfVh1\n02vvGWM2AV39zfReD8c8CDzodWuUUuoMmrnyc4k+pirWJ2OVUgNavbUxuPI9sZkGeqXUgOaco9cR\nPVg56TXQK6VCTWOLnYQhvrTSxdddpjTQK6UGNL0Z+zkN9EqpkGNvd3Da1q5z9Bbnqhs7xniXqlgD\nvVJqwHKlKB7qT8W6JMVG0u4wNLV5t2JdA71SasDSzJWd+fp0rAZ6pdSApYG+s8/z3WigV0qFCNem\nIzpH76QjeqVUyNERfWeJGuiVUqFGb8Z25uu+sRrolVIDlk7ddKYjeqVUyKnXqZtOEqIjENERvVIq\nhDS22okMF6IjNFQBhIUJCdERHb8APT4uSO1RSim/uTYdce5/pMC5SbhO3SilQobmuTmbL/luPNlK\nMFNE1onIIRE5KCL/bpWniMiHIpJn/TvcKhcReVRE8kVkn4jM9em7UUoNeY0tdl1xcwZfUhV7MqK3\nAz80xkwHFgJ3ich04F7gI2PMZOAj6z3AFTj3iZ0MrAIe86pFSill0RH92YIyojfGnDDG7LJeNwCH\ngXRgJbDGqrYGuMZ6vRJ41jhtBZLP2EhcKaU8Ut9iIz5al1a6C0qgdyciWcAcYBswyhhzwvroJDDK\nep0OlLodVmaVnXmuVSKSIyI5FRUVXjVaKTU0NLbaSdQRfSeJsZFebxDucaAXkXjgdeD7xph698+M\nMzmyVwmSjTGrjTHZxpjstLQ0bw5VSg0ROnVztqTYSFpsDlrtnqcq9ijQi0gkziD/N2PMG1bxKdeU\njPVvuVV+HMh0OzzDKlNKKY8ZY2hstROvgb4TX56O9WTVjQBPAYeNMX90++ht4Bbr9S3AW27l37BW\n3ywE6tymeJRSyiOnbe20O4ymPziDayrLm5U3nvyqXATcDOwXkT1W2c+AXwOviMhtwDHgq9Zn7wFX\nAvlAM3Crx61RSimLZq7s2uepij1/OrbXHjTGbAK6eyzti13UN8BdHrdAKaW64Ar0uo6+M18yWOqT\nsUqpAcmVuTJRp246CcocvVJK9Qeduulax4jeiyWWGuiVUgNSx6YjGug7cf2FU9esgV4pNcjppiNd\ni4oIIzYyXKdulFKDn96M7V6Sl0/HaqBXSg1IGui7522+Gw30SqkBqcFKURweppuOnCkxNkIDvVJq\n8GtoselovhtJsZHUe/HAlAZ6pdSA1NiqCc26k6hTN0qpUKCZK7vn7S5TGuiVUgNSQ6udeF1a2aWk\n2EgaWnXqRik1yDW02HRE3w3X07Ge0kCvlBqQGlp0d6nuJGqgV0qFgkZreaU6m47olVKDnq3dwWlb\nu6Y/6EbAA72I/FVEykXkgFvZ/SJyXET2WF9Xun32UxHJF5GjInKZV61RSimco3nQp2K7kxjrXb94\nMqJ/Bri8i/KHjDGzra/3AERkOnAjcJ51zP+KSLhXLVJKDXmuzJV6M7ZrAR/RG2M2ANUenm8l8JIx\nptUYU4RzO8H5XrVIKTXk1Wvmyh715Rz93SKyz5raGW6VpQOlbnXKrDKllPKYbjrSs9jIcCK8yAHk\na6B/DJgIzAZOAH/w9gQiskpEckQkp6KiwsdmKKVCUaMG+h6JiFejep8CvTHmlDGm3RjjAJ7g8+mZ\n40CmW9UMq6yrc6w2xmQbY7LT0tJ8aYZSKkQ1tOrUTW+CHuhFZIzb22sB14qct4EbRSRaRMYDk4Ht\nvlxDKTV0aS763iV4Eeh77UUReRFYBqSKSBnw38AyEZkNGKAYuB3AGHNQRF4BDgF24C5jTLuX7VdK\nDXE6R9+7L0wcwdse1u21F40xX+ui+Kke6j8IPOjh9ZVS6iwNLXaiwsOIidTV2d35yeXTuNfDuvpk\nrFJqwGlosRGvo/mA0UCvlBpwdNORwNJAr5QacBo0oVlAaaBXSg04jbq7VEBpoFdK+WTd0XKW/HZd\nR7qCQKpvseka+gDSQK+U8snfdx+npLqZnGJPU2F5rqHFToJO3QSMBnqllNccDsOmvEoAthUFPtDr\nzdjA0kCvlPLaoRP1VDW1ER4mbA9woDfGWIFep24CRQO9Usprn+Q6ExFePzeD/WV1NLfZA3bu5rZ2\n2h1G19EHkAZ6pZTXNuZVMH1MIlfOHIPdYdh1rDZg59ZNRwJPA71SyitNrXZ2Hqth8ZRU5o0bTpjA\n9qKqgJ2/QTcdCTgN9Eopr2wpqMLWblg6OY346AhmpCcF9IZsvSuhma66CRgN9EoNEQ99mMvHR075\nfZ6NeRXERoYzL8u5sdz8rBR2l9bSag9MolrddCTwNNArNQTUNdt49OM8/vRxvt/n2pBXycIJKURH\nODNLzh+fQpvdwb6yOr/PDW656DXQB4wGeqWGgB3F1RgDu0trqW5q8/k8pdXNFFU2sWTK57vCXZCV\nAhCwZZY6Rx94vQZ6a/PvchE54FaWIiIfikie9e9wq1xE5FERybc2Dp8bzMYrpTyzzbpZagx8klvu\n83k25DmXVS6e/HmgHz4siqmjEgI2T6+rbgLPkxH9M8DlZ5TdC3xkjJkMfGS9B7gC5/aBk4FVODcR\nV0r1s21F1VyQNZzU+Cg+PlLh83k25FaQnhzLxLRhncrnj09hZ3E19naHv03tuBk7LEoDfaD0GuiN\nMRuAM39VrwTWWK/XANe4lT9rnLYCyWfsL6uU6mMNLTYOHK/jwgkjWDplJJ8cLfcpINvbHWzOr2Lx\n5FREpNNn88en0NTWzqET9X63t9FKURweJr1XVh7xdY5+lDHmhPX6JDDKep0OlLrVK7PKlFL9JKe4\nBoeBBRNGsGLaSOpb7Owq8f4Bpz2ltTS02jvNz7ssGB+4efqGFptO2wSY3zdjjTEG5ybhXhGRVSKS\nIyI5FRW+/ymplOrZ1qIqIsOFuWOHs3hKKhFhwsdHvJ+n35BbQZjAoompZ302MjGG8anDAjJPr5uO\nBJ6vgf6Ua0rG+tf1v+Y4kOlWL8MqO4sxZrUxJtsYk52WdvYIQSkVGNsKq5mVkUxsVDiJMZFkZw1n\nnS+BPq+SWZnJJMV1vRpmflYKO4qrcTi8Hvd1opkrA8/XQP82cIv1+hbgLbfyb1irbxYCdW5TPEqp\nPtbUamf/8ToWTEjpKFsxbSRHTzVwvPa0x+epbW5jX1ktSyZ3PyibPz6F2mYbueUNfrW5QTcdCThP\nlle+CGwBpopImYjcBvwauERE8oCLrfcA7wGFQD7wBPCdoLRaKeWRncdqaHcYFowf0VG2YtpIAK+m\nbzblV+IwsGTK2dM2LvMDNE/f0GLXh6UCrNfeNMZ8rZuPvthFXQPc5W+jlFKBsbWwivAwYd644R1l\nE9PiyUyJZd2Rcm5eOM6j82zMrSQhJoJZGcnd1skYHss5STFsK6rmGxdm+dzmhlY7iRroA0qfjFUq\nhG0rqub89CSGud3cFBFWTB3J5oJKWmy956cxxrAhr4KLJqUSEd59yBAR5o9PYXtRNc4xn28aWmx6\nMzbANNArFaJOt7Wzr6y20/y8y/JpI2mxOdhS0Ht64YKKRk7UtXR6GrY788ePoKKhleKqZp/abGt3\n0GJz6Bx9gGmgVypE7SqpwdZuWDhhxFmfLZwwgtjIcI/m6T/Jde4N29P8vMvn8/S+5afXzJXBoYFe\nqRC1rbCKMIFst/l5l5jIcBZNSuXjI+W9TrNsyK1gQtowMobH9XrNiWnDGDEsyuf19B2ZK3XqJqA0\n0CsVorYWVjMjPanbaZAV00ZyvPY0eeWN3Z6jxdbOtqKqHpdVunOfp/dFvWauDAoN9Er54bPa07yw\nrYS7X9jFBwdP9ndzOrTY2tlTWtuRmqAry6c5g3d30zfGGP74YS4tNgfLpnr+UOP88SmU1Zz2ap2+\niytzpa66CSztTaW80GZ3kFNczfrcCtYfLSf3lHM0HCZw+EQ9l0wfdVbCr/6wu6SWtnZHp/XzZxqT\nFMu5YxL5+Eg5dyyd2OkzYwy/+ucRVm8o5OaF41jaRX6b7rjm6XcUVZM+x7tUV7rpSHBobyrloWc+\nLeJ37x+lqa2dyHDnFMUN8zJZNjWNncdquPeN/ewurWXu2LPnxPvatqIqROCCHkb0ACumpfGXTwqp\na7Z1pDYwxvA/7x3miY1FfOPCcfz8y+d59ctr2uhEEmMi2FZUzTVeB3qdugkGnbpRfeatPcf5wct7\n/Fpj3V8cDsOf1uUzaVQCT34jmz3/dSl/+/ZC/m3JBCaPSuCqmWOIjQzn1ZzS3k/WB7YVVnPu6ESS\nYnsOmCumjaTdYTo2FDHG8OC7ziB/iw9BHiA8TLggK4VP8yu9Toesm44EhwZ61SdOt7Xzy3cO8+bu\n46zPHXzZSo+cbKCysY2bF47j4umjOj2ABM4R6BXnj+Yfe09wui0wm2T7qtXezq6Smi6XVZ5pduZw\nhsdFss5affPAu4d5clMR3/xCFvf7EORdrpuXQUl1M09sLPLqOF11Exwa6FWfeH7rMSobW0mIieCx\ndQX93Ryvbcp3/nK6aFL3a8lvmJdJY6udfx3s3zx+e0vraLU7unxQ6kzhYcLSKWmsz63gF+8c4ikr\nyP/3l6b7da/hihmjuWLGaB76MJe8U54nOWtosRMVHkZMZLjP11Zn00Cvgq65zc7jGwq4aFIqP7h4\nCtuLq8kpDsz+on1lY14lk0fGMzoppts6C8anMDYljldzyvqwZWfbVuh8WGl+Vu+BHpxPyVY3tfH0\np8Xcusj/IA/OZZa/vGYG8TER/PDVvR5P4TS02PRGbBBooFdB5xzNt/H9iydz4/xMkuMi+csng2dU\n32JrZ3tRda8pAMLChOvnZbC5oIrSat9SALi8uL2ERb/+mPv+vp+cYu9yx2wrqmba6ASGD4vyqP6y\nKSMZnRjDqiUT+K+r/Q/yLqnx0fxy5Qz2ldXx+IZCj45paNFc9MGggV4FVXObncc/KWTx5FSys1KI\ni4rgm1/IYu3hco6e9C9veVVjK89vPUZ5Q0uAWtu1nOIaWu0OFk/uPQXAdfMyEIHXdvo+qn9rz3F+\n9uZ+oiPDeG1nGdf/ZQuLf7uO371/pNdpEFu7g53HanpcP3+mpLhINt+7gp9deW7Al4ZeNXMMV80c\nw8Nrczlysvf9ZHXTkeDQQK+C6rktx6hqauP7F0/pKLvlwixiI8N53MdR/b6yWu55ZQ8X/upj7vv7\nAX74yt6gruTZmF9BZLh4NOednhzLRZNSeW1nmU87LX10+BQ/fGUvF2Sl8N73FpNz3yX84YZZjE8d\nxmPrC7jkoQ1c+chG/vRxHpvzKzuWI7rsK6vjtK2dBR7ciHUXFsSNuH/x5fNIjInkR6/uxdbLFI5m\nrgwOv3pURIqBBqAdsBtjskUkBXgZyAKKga8aY2r8a6YajJpa7Ty+oZAlU9I65UMfPiyKr80fy5ot\nxdxz6RSPcqi02R3888AJ1mwuZldJLXFR4fyfCzJJio3kT+vyeWPXca6blxGU72NTXiVzxw4nLsqz\nH5fr52VHJrwoAAATLUlEQVTw7y/tYUthFYt6uHl7pi0FVdz5t11MPyeRp27J7rghed28DK6bl0FF\nQyvv7PuMv+/5jN9/kAuACEweGc+sjGRmj03u+Ctpvhcj+mAbER/NA9fM4M6/7eIv6wv47hcnd1u3\nocVOZkrv/x+UdwLxq3O5MabS7f29wEfGmF+LyL3W+58E4DpqkHl2yzGqm5xz82f69uLxPLulmCc3\nFnH/l8/r9hzGGJ7aVMTjGwqpaGgla0Qc/3X1dK7PziAxJhKHw7ClsIpfvnuIpVPTSI2PDuj3UNnY\nysHP6vnxZVM9Puay80aTEBPBqzmlHgf6vaW1fHvNDsalxPHMrfO7fGAoLSGaWxeN59ZF46ltbmNv\nWR17SmrZU1rDR0fKedWaLpo8Mj7g/eCvK84fw5dnncOjH+fxxXNHMf2cxE6ftzsMe0prKG9oPesz\n5b9g/I20ElhmvV4DrEcD/ZDT1Gpn9YYClk5J6/JJ0XOSY7lmTjov7SjhuysmMaKLwNTuMNz39wO8\nuL2Eiyal8tvrZ7J0clqnaYawMOHXXzmfKx/dyC/fOcQjN84J6Pfxab5zDNPTssozxUSGs3L2Obya\nU8YvWmwk9vKUZ+6pBm55ejsp8VE8d9sCUjy4iZocF8XSKWkdqQmMMZRWn2ZPWS0T04Z53Na+9PMv\nn8fmgip+9Ope3rp7ETXNbXxytIL1uRVsyquk7rSNMIE5A+DJ4lDjb6A3wAciYoDHjTGrgVFuG4Kf\nBEb5eQ01CK3ZUkxNs63L0bzLHUsn8NrOMtZsLuaeSzuPmFvt7dzz8l7e3X+Cu5ZP5EeXTu32RuHk\nUQnctXwSD6/N45rZ6Sy39kQNhE15lSTFRjIjPcmr426Yl8nzW0v4x97PuGlB99v1lVQ183+f3EZU\neBh/u21hj8s3eyIijB0Rx9gRA3faY/iwKP7n2hmsem4nS3+7js/qnDfR0xKiuWT6KJZNTeOiSakk\nx3m2Wkh5zt9Af5Ex5riIjAQ+FJEj7h8aY4z1S+AsIrIKWAUwduxYP5uhBpLGVjurNxSybGpaj6Oz\nSSMTuHT6KNZsOcaqpRM7bsI1tdq54/mdbMyr5L6rzuXbiyf0es07l03k3X0nuO/vB/jgB0vOenLV\nF8YYNuVXsmjSCMK9vFk5MyOJKaPieTWnrNtAv7e0lrtf3EVbu4OXV104oIN0oFx63mi+fdF49h+v\n4yYrWdr0MYlBvRms/Fx1Y4w5bv1bDrwJzAdOicgYAOvfLnOgGmNWG2OyjTHZaWmeZ8ZTA9+azcXU\nNts6rbTpzh3LJlJ32sZL20sAqGlq46Ynt7G5oIrfXT/ToyAPEB0Rzq+vO5/P6k7z+w+O+tV+F2+2\n0DuTiPDV7Ez2lNaetSTyRN1p7nl5Dyv//Cmn29p55tb5TB2dEJA2Dwb3XT2dl2+/kLuWT2JGepIG\n+T7gc6AXkWEikuB6DVwKHADeBm6xqt0CvOVvI9Xg0dBi44mNhSyfmsbszORe688dO5yFE1J4YmMh\npdXNfPXxLRw6Uc9jN83lhuxMr649b1wKNy8cxzObi9ld4v9Cr4153s/Pu7tmTjoRYdJxk/R0WzsP\nr81l+e/X886+E9y5bCLrfrTMo35Syh/+/H07CnjTmjeNAF4wxvxLRHYAr4jIbcAx4Kv+N1MNFs98\n6vlo3uXOZZO45a/bufShDYSHCWtunc+FE71bB+7y48um8uGhU9z7+n7+8d2LiIrw/Y/WTXmVZI2I\n83m5X2p8NCumjeSNXceZNjqB371/lBN1LVx1/hjuvWKaLiNUfcbnnwJjTKExZpb1dZ4x5kGrvMoY\n80VjzGRjzMXGmMGV1ET5rLa5jdUbC7lk+ihmeTFKXTI5lZkZScRGhfPivy30OciDM4vkL1fO4Oip\nBlZv8D3NQpvdwdbCKp+mbdzdkJ1JZWMr97yyl9T4aF65/UL+fNNcDfKqT+kjaCpgHt9QSGOrnR9e\n6vloHpzz2c/dtgCg1/zpnrh4+iiumjmGRz/K58rzxzAhLd7rc+wuqaGprZ2LPEh70JNlU9P4xoXj\nmJmRzFfmpOt8tOoXmgIhRPxj72esejaHkir/kmn5qryhhWc+LeZLM89h2mjvH3hJio0MSJB3+e8v\nTScqIoxf/fNI75W7sCm/kvAw8euvC4DI8DB+sXIG18/L0CCv+o0G+hDQ1Grn/rcP8sGhU1z56Ebe\n2FXW57s4/e+6AtraHfzgEu9G88EyMiGG7yyfyIeHTrG5oLL3A86wMa+SWRlJvT7spNRgoIE+BDz9\naRFVTW386etzmD4mkXte2cv3XtpD3Wlb7wcHQFlNMy9sK+GGeRmMTx04T2V+a9F40pNjeeCdw7R7\nkWCsrtnGvrJav+fnlRooNNAPcnXNNh7fUMjF547i6pnn8OKqhfz4sqm8t/8EVz6ysWMTimB69KM8\nAL7XQ7Kq/hATGc5PrpjGoRP1vL7L87TBmwsqcRg8Skus1GCggX6Qe3xDQacboOFhwl3LJ/H6nV8g\nIly48Ymt/O79I72mh/VVYUUjr+86zk0Lx3JOcmxQruGPL80cw5yxyfz+/aM0WRtP92ZjfiXx0RFe\nrRxSaiDTQD+IVTS08rR1A/TcMZ1vgM7OTOa97y3mhnkZ/HldAdc9tpmiyqaAt+GhtXlER4TxnWWT\nAn7uQBAR7rtqOuUNrR7vcrQpr5KFE0YQGa4/Hio06P/kQezP6/J7vAE6LDqC314/i8dumsuxqmau\nfGQjL+8oCdiN2kOf1fOPvZ9x66Is0hIGVlpcd/PGDefqmWNYvaGAE3Wne6x7rKqJkupmlkzRaRsV\nOjTQD1LHa0/zwrYSrp/b+w3QK84fw7++v5g5Y5P5yev7ufP5XdQ0tfndhj9+eJSEmAhWLZ7o97mC\n7SeXT8Nh4Hfv95wH5939zsSrvqY9UGog0kA/SD261roB2kMaYHdjkmJ5/rYF/OzKaXx05BSXP7Kh\nI9e6L3aV1LD2cDl3LJ1IUtzAX4KYmRLHtxaN541dx9lXVnvW5/nljXzrmR389l9HmTM2eUCtHlLK\nXxroB6HCikZe21XG1xeMJd2LG6BhYcKqJRN58zuLiI+O4KYnt/Hgu4dotbd73Ybfv3+U1PgovvmF\nLK+P7S/fWT6REcOieODdwx3TVzVNbdz/9kEuf3gDO4qq+dmV03hp1cKAb5KtVH/SFAiD0ENr84gK\nD+Ou5b7dAJ2RnsQ7313Mg+8d4omNRXySW8GvvjKz076u3alrtvHge4fYXFDFf149PSB53/tKYkwk\nP7hkCvf9/QDv7j9BRUMrD6/No6HFxtfmj+UHl0wZcFvwKRUIg+endIjYVljFW3s/Y9roBJZNGXnW\nZhSHTzhvgH5n2US/boDGRoXzwDXns2LaSP7jzQNc/5fN3HJhFj+6bGrHBiDujDH888BJ/uutg9Q0\nt3H70gnccmH3OycNVDdekMmzW4q5+4XdgHOt/H9cda5PaRuUGiw00PvBGMPJ+hbCRRiZ6NsWcC5b\nC6t4eG0uWwuriYoIo83uAA4yIXUYS6aksWxqGgsnjOAPH+SSEBPB7UsCcwN0xbRRfHjPCH73ryOs\n2VLMBwdP8sC1M1gx7fMdIE/WtfCfbx3gw0OnmJGeyDO3XuD11noDRUR4GA9eez5//CCXf1synuVT\nR+o0jQp50tc5UbqSnZ1tcnJyAna+wopGEmMjvfozvKapjeKqJhJiIkmOcybYOnMddUOLjf1ldewu\nrWVPaS17S2spb2gFYNroBJZOTWPZlJFkZw33eA32lgJngN9WVE1aQjR3LJ3I1+eP5WR9C+uPlrP+\naAVbC6totTuIjgij1e7gR5dO4e4VgX8KdeexGu59fR955Y18edY5/OfV03n/4El+888j2BwOfnjJ\nVG5dlEWEri9XakAQkZ3GmOxe64VSoN9RXM0ja/PYZK0myRgey+zM5I6vGelJxESG02pv5+Bn9ey1\nAvae0lqOdZH1MSE6gqQ4Z+BvsTkoqGjE1V0TUocxyzpvi62d9UcryDlWja3dEB8dwaJJI1g6ZSTj\nutkHtKHFxl8/LWZ7UTUjE6K5c9lEvjZ/LDGR4WfVbbG1s6Wwik+OVnCyroU/fHVW0ObG2+wOHltf\nwJ/X5eMwBrvDcNGkVP7n2vOHxJ6mSg0m/R7oReRy4BEgHHjSGPPr7uq6B3r3UfPe0loiwoUlk9NY\nOjWNMUldrzDZUVzNw2tz+TS/itT4KL510XgiwsQZxEtqO3abjwgTxqbEUVrTjK3d+X2PSoxmdmYy\nszKTmTwygeY2O3WnbdQ02ag93UZds43a0zbCBGZmOAP7zIykLneqb2y182l+JeuPVvDJ0fKO63Zn\nVGI0dy6dyI3dBPj+lF/ewMNr81g2dSTXzU3X6Q2lBqB+DfQiEg7kApcAZcAO4GvGmENd1Z9w7kxz\n3S+eY09pLfluo+bxqcNosbVzwgqY00YnsHSKM+hnj0thT2ktD6/NZXNBFanx0dyxdAI3LRhHbFTn\noFle3+KcaimrJe9UIxPS4pmdmcTszOGMTvJvbr07xhgKKpqo7ubBpDCh4y8MpZTyRX8H+guB+40x\nl1nvfwpgjPlVV/Wjx0w2533nf50j64xkZo9NZpY1ajbGkHuqkU9ynfPVO4qd0yOu+eq0hGhuX9J1\ngFdKqVDmaaAP1qqbdKDU7X0ZsMC9goisAlYBpGeOY+d9F3c5PSAiTB2dwNTRCaxaMpHGVjub8yvZ\nXFDFuBFx3c5rK6WUcuq35ZXGmNXAanDO0Xs6BxwfHcGl543m0vNGB7N5SikVMoK1Tu44kOn2PsMq\nU0op1ceCFeh3AJNFZLyIRAE3Am8H6VpKKaV6EJSpG2OMXUTuBt7Hubzyr8aYg8G4llJKqZ4FbY7e\nGPMe8F6wzq+UUsoz+iy7UkqFOA30SikV4jTQK6VUiNNAr5RSIW5AZK8UkQag512bh55UwPdNXUOP\n9kdn2h9nG4p9Ms4Yk9ZbpYGy8chRT/I1DCUikqN98jntj860P86mfdI9nbpRSqkQp4FeKaVC3EAJ\n9Kv7uwEDkPZJZ9ofnWl/nE37pBsD4masUkqp4BkoI3qllFJBErRALyJ/FZFyETngVna/iBwXkT3W\n15VWeZSIPC0i+0Vkr4gs6+J8b7ufa7Dpqj+s8u+KyBEROSgiv3Ur/6mI5IvIURG5zK08WURes445\nbO3mNeh40x8iMkJE1olIo4j86Yz6USKyWkRyreOu68vvI5C87JNLRGSn9TOzU0RWuNUPiT7xsj/m\nu8WVvSJyrVv9kPiZ8YsxJihfwBJgLnDArex+4Edd1L0LeNp6PRLYCYS5ff4V4AX3cw22r276Yzmw\nFoh2fe/Wv9OBvUA0MB4oAMKtz9YA37ZeRwHJ/f299UF/DAMuAu4A/nTGeX4OPGC9DgNS+/t766M+\nmQOcY72eARwPtT7xsj/igAjr9Rig3O19SPzM+PMVtBG9MWYDUO1h9enAx9Zx5UAtkA0gIvHAPcAD\nQWhmn+mmP+4Efm2MabXqlFvlK4GXjDGtxpgiIB+YLyJJOP/zP2XVbzPG1PbJNxBg3vSHMabJGLMJ\naOniVN8CfmXVcxhjBu0DM172yW5jzGdWnYNArIhEW+9Dok+87I9mY4zdqhMDGIBQ+pnxR3/M0d8t\nIvusP8uGW2V7gS+LSISIjAfm8fkOVb8E/gA090Nbg20KsFhEtonIJyJygVXe1Z676ThH9xXA0yKy\nW0SeFJFhfdvkoOquP7okIsnWy1+KyC4ReVVERgW/mX3Kkz65DthljGkdAn3SbX+IyAIROQjsB+6w\nAn+o/8x4pK8D/WPARGA2cAJnAAf4K85glgM8DGwG2kVkNjDRGPNmH7ezr0QAKcBC4MfAKyI9bp4b\ngfNP2ceMMXOAJuDeoLey7/jSHxnAZmPMXGAL8Pugt7Jv9dgnInIe8Bvgdrf6odwn3faHMWabMeY8\n4ALgpyISQ+j/zHikTwO9MeaUMabdGOMAngDmW+V2Y8wPjDGzjTErgWQgF7gQyBaRYmATMEVE1vdl\nm4OsDHjDOG0HHDjzdXS3524ZUGaM2WaVv4bzP3Go6K4/ulOF8y+9N6z3rxJa/QE99ImIZABvAt8w\nxhRY9UO9T3r9P2KMOQw04rx3Eeo/Mx7p00AvImPc3l4LHLDK41x/TonIJYDdGHPIGPOYMeYcY0wW\nzptxucaYZX3Z5iD7O86bS4jIFJw3iipx7q97o4hEW1NZk4HtxpiTQKmITLWO/yJwqO+bHTTd9UeX\njDEG+AewzCoKtf6AbvrEmqJ5F7jXGPOpq/IQ6JPu+mO8iERY5eOAaUDxEPiZ8Uyw7vICL+KcnrHh\n/K16G/AczvmzfTiD2RirbhbO7JWHcd5RH9fF+bIY3KtuuuqPKOB5nL/wdgEr3Or/B87VNkeBK9zK\nZ+Oc4tqH8z/98P7+3vqoP4px3phrtOpPt8rHARus/vgIGNvf31tf9AlwH85piD1uX64VKCHRJ172\nx804b0rvscqvcTtPSPzM+POlT8YqpVSI0ydjlVIqxGmgV0qpEKeBXimlQpwGeqWUCnEa6JVSKsRp\noFdKqRCngV4ppUKcBnqllApx/x96wyWBRjydrQAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Number of letters authored by Hugo de Groot per year\n", "hugo = 'Groot, Hugo de, 1583-1645'\n", "\n", "df['year'] = df['date'].str.extract('(\\d\\d\\d\\d)', expand=False)\n", "\n", "df_hugo = df[df['author'] == hugo]\n", "\n", "df_hugo['year'].value_counts().sort_index().plot()\n", "\n", "plt.show()\n" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Reigersberch, Nicolaas, 1584-1654 862\n", "Groot, Willem de, 1597-1662 726\n", "Oxenstierna, Axel (Count), 1583-1654 587\n", "Camerarius, Ludwig, 1573-1651 347\n", "Vossius, Gerardus Joannes, 1577-1649 339\n", "Marin, Charles, d.1651 122\n", "Oxenstierna, Johan Axelsson, 1611-1657 109\n", "Salvius, Johan Adler, 1590-1652 106\n", "Wicquefort, Joachim van, 1600-1670 99\n", "Heinsius, Daniel, 1580-1655 89\n", "Aubery du Maurier, Benjamin, 1566-1636 85\n", "Appelboom, Harald Andersson, 1612-1674 83\n", "Christina, Queen of Sweden, 1626-1689 77\n", "Schmalz, Peter Abel, fl. 1635-1638 61\n", "Uyttenbogaert, Johannes (Dr), 1557-1644 51\n", "Jaski, Israel, 1573-1642 39\n", "Bielke, Sten Svantesson, 1598-1638 39\n", "Spiring Silvercrona, Petter, 1600-1652 39\n", "Groot, Johan Hugo de, 1554-1640 37\n", "Lingelsheim, Georg Michael, 1556-1636 36\n", "Unknown 33\n", "Bernegger, Matthias, 1582-1640 29\n", "Müller, Georg, d.1639 29\n", "Sprecher von Bernegg, Fortunatus, 1585-1647 24\n", "Casaubon, Isaac, 1559-1614 23\n", "Grubbe, Lars, 1601-1642 22\n", "Camerarius, Joachim, 1603-1687 22\n", "Meursius, Johannes, 1579-1639 22\n", "Vossius, Isaac (Dr), 1618-1689 20\n", "Otto II, 1578-1637 19\n", " ... \n", "Skytte, Bengt, 1614-1683 1\n", "Gomaer, François, 1563-1641 1\n", "Barclay-Debonnaire, Louise, 1585-1652 1\n", "Emporagrius, Erik Gabrielsson, 1606-1674 1\n", "Cappel, Louis, 1585-1658 1\n", "Jack, Gilbert, 1577-1628 1\n", "Vossius, Gerardus, 1619-1640 1\n", "Aligre, Étienne, 1550-1635 1\n", "Höpfner, Heinrich, 1583-1642 1\n", "Aa, Anthony Willemsz., 1582-1638 1\n", "Rigault, Nicolas, 1577-1654 1\n", "Brederode, Reinoud, 1567-1633 1\n", "Jungermann, Gottfried, 1577-1610 1\n", "Wertheim de Rochefort, Johann Dietrich Löwenstein, 1585-1657 1\n", "L'Empereur, Constantine, 1591-1648 1\n", "Stringe, Johan, fl. 1622 1\n", "Forstner, Christoph von, 1598-1667 1\n", "Voisin, Joseph, 1610-1685 1\n", "Orléans, administration of the German Nation at 1\n", "Sweerts, Pierre François, 1567-1629 1\n", "Pas, Isaac Manassès de, 1590-1640 1\n", "Mesmes, Henri, d.1650 1\n", "Gardie, Magnus Gabriel de la, 1622-1686 1\n", "Wertheim de Virneburg, Friedrich Ludwig Löwenstein, 1598-1657 1\n", "Bogislaw, Ernst, 1620-1684 1\n", "Hohenlohe-Langenburg, Philipp Ernst von, 1584-1628 1\n", "Oldenbarnevelt, Willem van, 1590-1638 1\n", "Beauharnais, François, d.1651 1\n", "Vair, Guillaume, 1566-1621 1\n", "Menasseh, Ben Israel, 1604-1657 1\n", "Name: addressee, dtype: int64" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_hugo['addressee'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some collections include letters between correspondents of the collection creator, while others only contains letters where the collection creator is the author or addressee of the letter. \n", "\n", "E.g. the collection of correspondence of Hugo de Groot includes letters between his brother and his brother-in-law.\n", "\n" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Huygens, Christiaan, 1629-1695 1345\n", "Huygens, Constantijn, 1628-1697 175\n", "Oldenburg, Henry, 1619-1677 105\n", "Huygens, Constantijn, 1596-1687 75\n", "Sluse, René François de, 1622-1685 72\n", "Chapelain, Jean, 1595-1674 70\n", "Moray, Robert (Sir), 1608-1673 68\n", "Schooten, Frans van, 1615-1660 58\n", "Boulliau, Ismaël, 1605-1694 54\n", "Leibniz, Gottfried Wilhelm, 1646-1716 42\n", "Bruno, Henrick, 1617-1664 40\n", "Huygens, Susanna, 1637-1725 32\n", "Heinsius, Nicolaas, 1620-1681 28\n", "Petit, Pierre, 1598 or before-1677 24\n", "Medici, Leopoldo de', 1617-1675 24\n", "Doublet, Philips, 1633-1707 23\n", "Hudde, Johannes, 1628-1704 21\n", "L'Hôpital, Guillaume François Antoine de, 1661-1704 21\n", "Wallis, John (Dr), 1616-1703 20\n", "Mersenne, Marin, 1588-1648 20\n", "Fatio de Duillier, Nicolas, 1664-1753 20\n", "Saint-Vincent, Grégoire de, 1584-1667 19\n", "Kinner von Löwenthurn, Gottfried Alois, b.1610 18\n", "Mylon, Claude, 1618-1660 18\n", "Fermat, Pierre, 1601-1665 17\n", "Hire, Philippe de la, 1640-1718 16\n", "Gent, Pieter, b.1640 16\n", "Graaf, Jan, 1673-1697 16\n", "Hevelius, Johannes, 1611-1687 15\n", "Thévenot, Melchisédech, 1620-1692 13\n", " ... \n", "Hobbes, Thomas, 1588-1679 1\n", "Mathion, Oded Louis, 1620-1700 1\n", "Louise Hollandine, Countess Palatine, 1622-1709 1\n", "Regnauld, André, d.1702 1\n", "Benoit, Antoine, 1632-1717 1\n", "Limojon, Alexandre-Toussaint de, 1630-1689 1\n", "Leeuwenhoek, Antoni van, 1632-1723 1\n", "Gillet, Pierre François, 1648-1720 1\n", "Nassau-Siegen, Hendrik of, 1611-1652 1\n", "Boecler, Johann Heinrich, 1611-1672 1\n", "Lely, Peter, 1618-1680 1\n", "Christina, Queen of Sweden, 1626-1689 1\n", "Kircher, Athanasius, 1601-1680 1\n", "Vossius, Isaac (Dr), 1618-1689 1\n", "Molyneux, Thomas (Sir), 1661-1733 1\n", "Briou, fl. 1675 1\n", "Douw, Simon, 1620-1663 1\n", "Court of Holland, Zeeland and West Friesland, 1\n", "Alberghetti, Sigismondo, d.1702 1\n", "Placentius, Johann, d.1683 1\n", "Wijk, Johan van der, 1625-1679 1\n", "Magalotti, Lorenzo, 1637-1712 1\n", "Varignon, Pierre, 1654-1722 1\n", "Cock, Christopher, fl. 1684 1\n", "Dodart, Denis, 1634-1707 1\n", "Holmes, Robert (Sir), 1622-1692 1\n", "Vallot, Antoine, 1594-1671 1\n", "Bilberg, Johann, 1650-1717 1\n", "Loménie, Henri-Auguste, 1595-1666 1\n", "Smethwick, Francis, d.1682 1\n", "Name: author, dtype: int64" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_christiaan = df[(df['author'] == 'Huygens, Christiaan, 1629-1695') | (df['addressee'] == 'Huygens, Christiaan, 1629-1695')]\n", "\n", "df_christiaan = df[df['collection'] == 'Huygens, Christiaan']\n", "df_christiaan['author'].value_counts()\n" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Huygens, Constantijn, 1596-1687 4252\n", "Solms-Braunfels, Amalia von, 1602-1675 768\n", "Huygens, Christiaan, 1551-1624 94\n", "Barlaeus, Caspar, 1584-1648 81\n", "Sauzin, Jean 68\n", "Heinsius, Daniel, 1580-1655 63\n", "Lionne, Hugues de, 1611-1671 44\n", "Rivet, André, 1572-1651 44\n", "Hooft, Pieter Cornelius, 1581-1647 40\n", "William III and II, King of England, Scotland, and Ireland, 1650-1702 39\n", "Beringhen, Henri, 1603-1692 33\n", "Chièze, Sebastien, 1625-1679 29\n", "Unknown 28\n", "Langes de Montmirail, Frédéric de, 1630-1697 26\n", "Leu de Wilhem, David le, 1588-1658 25\n", "Chambrun, Jacques Pineton, 1635-1689 23\n", "Dohna, Frederick von, 1621-1688 21\n", "Council of the Prince, fl. 1656-1664 20\n", "Loménie, Henri Louis, 1635-1698 20\n", "Swann-Ogle, Utricia, 1611-1674 19\n", "Ban, Jan Albert, 1598-1644 19\n", "Jermyn, Henry, 1605-1684 18\n", "Puteanus, Erycius, 1574-1646 18\n", "Nassau-Siegen, Hendrik of, 1611-1652 18\n", "Westerbaen, Jacob, 1599-1670 17\n", "Cusance, Béatrix, 1614-1663 17\n", "Schurman, Anna Maria van, 1607-1678 17\n", "Boxhorn, Marcus Zuerius van, 1612-1653 15\n", "Bennet, Henry, 1618-1685 15\n", "Le Tellier, François Michel, 1641-1691 14\n", " ... \n", "Wierts, Joan, fl. 1650-1692 1\n", "Muelen, Andries, 1591-1654 1\n", "Molino, Domenico, 1573-1635 1\n", "Burg, fl. 1662 1\n", "Frédéric-Armand, comte de Schomberg, 1615-1690 1\n", "Fürstenberg, Ferdinand von, 1626-1683 1\n", "Colvius, Andreas, 1594-1671 1\n", "Sprecher von Bernegg, Fortunatus, 1585-1647 1\n", "Sipenesse, Cornelis, d.1635 1\n", "Santen, Jan, fl. 1635-1649 1\n", "Caron, Suzette, fl. 1669-1689 1\n", "Amat, Angélique, fl. 1666 1\n", "Brancas, Marie, fl. 1613-1662 1\n", "Nicholas, Edward (Sir), 1593-1669 1\n", "Beauvais, Charles, b.1590 1\n", "Sohier, Nicolaas, fl. 1638 1\n", "Zuerius (Miss), fl. 1667 1\n", "Sylvius, Jean, fl. 1662-1666 1\n", "Does, Jacob, 1641-1680 1\n", "Huygen, Johan, fl. 1632-1640 1\n", "Coesmans, Jan, fl. 1643-1656 1\n", "Brederode, Juliana, 1622-1678 1\n", "Zuylen van Nyevelt, Mechtelt, fl. 1666 1\n", "Berckel, Clemens, fl. 1613 1\n", "Cotton, John (Sir), 1621-1702 1\n", "Schagen, Diederik, fl. 1660 1\n", "Magerus, Petrus, 1609-1653 1\n", "Petit, Pierre, 1598 or before-1677 1\n", "Nanteuil, Robert, 1623-1678 1\n", "Enclos, Anne de l', 1620-1705 1\n", "Name: addressee, dtype: int64" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_constantijn = df[df['collection'] == 'Huygens, Constantijn']\n", "df_constantijn['addressee'].value_counts()\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df['author_freq'] = df.groupby(['author'])['id'].transform('count')\n", "df['addressee_freq'] = df.groupby(['addressee'])['id'].transform('count')\n", "df['correspondents_freq'] = df.author_freq + df.addressee_freq" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "id author addressee \n", "0000ab7c-f54a-493a-a066-ee929eedd1e3 Johnson, John, 1662-1725 Charlett, Arthur (Reverend), 1655-1722 1\n", "0000bd85-9139-4fec-b362-78b3f4a6b9c2 Boywer, F., fl. 1737 Rawlinson, Richard (Dr), 1690-1755 1\n", "0000d067-74f1-46ed-b4bc-e9d2765091e6 Alciatus, Francesco (Cardinal), 1522-1580 Aytta, Viglius Zuichemius ab, 1507-1577 1\n", "0002ce4c-db77-4863-a1ec-e8be9b9d121b Buffon, George Louis Leclerc de, 1707-1788 Jurin, James, 1684-1750 1\n", "0002dbb4-2785-4b05-aa59-2bbb3f654802 Sandford, Daniel (Reverend), 1729-1770 Ballard, George, 1705-1755 1\n", " ..\n", "fffdf05b-cba6-442e-a563-18598314a021 Howel, John, fl. 1760-1781 Gough, Richard, 1735-1809 1\n", "fffe4dd7-c074-4de0-b422-dc1c635eca0f Villiers, Christophe de, fl. 1633-1639 Mersenne, Marin, 1588-1648 1\n", "fffe9c50-5132-476e-9173-9b680d0f916e August II of Braunschweig-Wolfenbüttel, 1579-1666 Andreae, Johann Valentin, 1586-1654 1\n", "fffefd34-9784-4673-920e-1696883ef2a2 Zapata, Rodrigo, fl. 1574 Agustín, Antonio, 1517-1586 1\n", "ffffcc85-6dbc-4a57-8942-40323c4d18fc Willis, Browne, 1682-1760 Rawlinson, Richard (Dr), 1690-1755 1\n", "Length: 127418, dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(['id', 'author', 'addressee']).size()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "ids = list(df.sort_values('correspondents_freq').id)\n", "auths = list(df.sort_values('correspondents_freq').author)\n", "addrs = list(df.sort_values('correspondents_freq').addressee)\n", "auth_freqs = list(df.sort_values('correspondents_freq').author_freq)\n", "addr_freqs = list(df.sort_values('correspondents_freq').addressee_freq)\n", "\n", "auths = [auth if isinstance(auth, str) else None for auth in auths]\n", "addrs = [addr if isinstance(addr, str) else None for addr in addrs]\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'addr': 'Jones, Robert (Reverend), fl. 1698',\n", " 'addr_freq': 1.0,\n", " 'auth': 'Meare, John, 1649-1710',\n", " 'author_freq': 1.0,\n", " 'id': 'aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corrs = [{'id': id, 'auth': auth, 'addr': addr, 'author_freq': auth_freq, 'addr_freq': addr_freq} for id, auth, addr, auth_freq, addr_freq in zip(ids, auths, addrs, auth_freqs, addr_freqs)]\n", "\n", "corrs[0]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1942\n", "3873\n" ] } ], "source": [ "from collections import OrderedDict\n", "queued = {}\n", "fetch = OrderedDict()\n", "seen = {}\n", "for corr in corrs:\n", " if corr['auth'] not in queued and corr['addr'] not in queued:\n", " queued[corr['auth']] = corr['id']\n", " queued[corr['addr']] = corr['id']\n", " fetch[corr['id']] = corr\n", "\n", "print(len(fetch.keys()))\n", "print(len(queued.keys()))\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13827\n", "15758\n" ] } ], "source": [ "for corr in corrs:\n", " if corr['auth'] not in queued:\n", " queued[corr['auth']] = corr['id']\n", " fetch[corr['id']] = corr\n", " elif corr['auth'] not in queued:\n", " queued[corr['auth']] = corr['id']\n", " fetch[corr['id']] = corr\n", "\n", "print(len(fetch.keys()))\n", "print(len(queued.keys()))\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://emlo.bodleian.ox.ac.uk/profile/work/aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a\n" ] } ], "source": [ "for corr_id in fetch:\n", " url = f'http://emlo.bodleian.ox.ac.uk/profile/work/{corr_id}'\n", " print(url)\n", " break" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0idtypecollectiondateauthoraddresseeorigindestinationrepositoryauthor_freqaddressee_freqcorrespondents_freq
7334917320aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7aLetterBodleian card catalogue30 August 1698Meare, John, 1649-1710Jones, Robert (Reverend), fl. 1698Oxfordshire, EnglandNaNBodleian Library, University of Oxford: MS Bal...1.01.02.0
\n", "
" ], "text/plain": [ " Unnamed: 0 id type \\\n", "73349 17320 aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a Letter \n", "\n", " collection date author \\\n", "73349 Bodleian card catalogue 30 August 1698 Meare, John, 1649-1710 \n", "\n", " addressee origin destination \\\n", "73349 Jones, Robert (Reverend), fl. 1698 Oxfordshire, England NaN \n", "\n", " repository author_freq \\\n", "73349 Bodleian Library, University of Oxford: MS Bal... 1.0 \n", "\n", " addressee_freq correspondents_freq \n", "73349 1.0 2.0 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import requests\n", "from bs4 import BeautifulSoup as bsoup\n", "\n", "df[df.id == 'aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a']\n", "\n", "#response = requests.get(url)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "def get_relation_info(rel_type, detail_soup):\n", " rel_type_soup = detail_soup.find_all(class_=rel_type)\n", " if len(rel_type_soup) == 0:\n", " return None\n", " relation_soup = rel_type_soup[0].find_all(class_='relations')[0]\n", " return {\n", " 'relation_type': rel_type.split(' '),\n", " 'relation_text': [string for string in relation_soup.stripped_strings]\n", " }\n", "\n", "def get_provenance(page_soup):\n", " prov_soup = page_soup.find_all(class_='provenance')[0]\n", " prov = prov_soup.text\n", " return prov.replace('Source of data: ','')\n", "\n", "def get_page_details(corr_id, page_soup):\n", " page_details = {\n", " 'correspondence_id': corr_id,\n", " 'relations': [],\n", " 'provenance': get_provenance(page_soup)\n", " }\n", " detail_soup = page_soup.find(id='details')\n", " if detail_soup:\n", " rel_types = ['people authors', 'people recipients', 'locations origin', 'locations destination']\n", " relation_info = [get_relation_info(rel_type, detail_soup) for rel_type in rel_types]\n", " page_details['relations'] = [relation for relation in relation_info if relation != None]\n", " return page_details\n", "\n", "def get_correspondence_page(corr_id):\n", " url = f'http://emlo.bodleian.ox.ac.uk/profile/work/{corr_id}'\n", " response = requests.get(url)\n", " page_soup = bsoup(response.content)\n", " return get_page_details(corr_id, page_soup)\n", "\n", "corr_id = 'aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a'\n", "#detail_doc = get_page_details(corr_id, page_soup)\n", "detail_index = 'emlo_page_details'\n", "from elasticsearch import Elasticsearch\n", "\n", "es = Elasticsearch()\n", "\n", "#es.index(index=detail_index, doc_type='page_detail', id=detail_doc['correspondence_id'], body=detail_doc)\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'addr': 'Jones, Robert (Reverend), fl. 1698',\n", " 'addr_freq': 1.0,\n", " 'auth': 'Meare, John, 1649-1710',\n", " 'author_freq': 1.0,\n", " 'id': 'aabeebf7-4c5b-4bc2-a2ec-8ace326cfa7a'}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import time\n", "\n", "headers = {\n", " 'user-agent': 'DataScopesAnalyzer (https://marijnkoolen.github.io/Data-Scopes-Developers-2018/)',\n", " 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',\n", " 'Accept-Language': 'en-gb',\n", "}\n", "\n", "fetch[corr_id]\n", "#time.sleep(10)" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "skipped 1000\n", "skipped 2000\n", "skipped 3000\n", "skipped 4000\n", "skipped 5000\n", "skipped 6000\n", "skipped 7000\n", "skipped 8000\n", "skipped 9000\n", "skipped 10000\n", "skipped 11000\n", "skipped 12000\n", "12100 correspondence pages fetched\n", "12200 correspondence pages fetched\n", "12300 correspondence pages fetched\n", "12400 correspondence pages fetched\n", "12500 correspondence pages fetched\n", "12600 correspondence pages fetched\n", "12700 correspondence pages fetched\n", "12800 correspondence pages fetched\n", "12900 correspondence pages fetched\n", "13000 correspondence pages fetched\n", "13100 correspondence pages fetched\n", "13200 correspondence pages fetched\n", "13300 correspondence pages fetched\n", "13400 correspondence pages fetched\n", "13500 correspondence pages fetched\n", "13600 correspondence pages fetched\n", "13700 correspondence pages fetched\n", "13800 correspondence pages fetched\n" ] } ], "source": [ "from elasticsearch import exceptions\n", "\n", "skip = 0\n", "\n", "for ci, corr_id in enumerate(fetch):\n", " if es.exists(index=detail_index, id=corr_id):\n", " #print('skip', corr_id)\n", " skip += 1\n", " if skip % 1000 == 0:\n", " print('skipped', skip)\n", " continue\n", " #print('fetching page for', corr_id)\n", " detail_doc = get_correspondence_page(corr_id)\n", " try:\n", " detail_doc['author'] = fetch[corr_id]['auth']\n", " detail_doc['addressee'] = fetch[corr_id]['addr']\n", " except TypeError:\n", " print(fetch[corr_id])\n", " raise\n", " try:\n", " es.index(index=detail_index, doc_type='page_detail', id=detail_doc['correspondence_id'], body=detail_doc)\n", " except exceptions.RequestError:\n", " print(detail_doc)\n", " raise\n", " time.sleep(10)\n", " if (ci+1) % 100 == 0:\n", " print(ci+1, 'correspondence pages fetched')\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }