\n", "H0 (null hypothesis)- There is no correlation between sardine catch and sardine larvae

\n", "H1 (alternate hypothesis) - There is a correlation between sardine catch and sardine larvae\n", "\n", "In simpler terms, if our value is below 5%, then we can safely conclude that the two variables we are testing do in fact have a linear correlation with one another. Thus, from our pearson correlation result, we can conclude that there is a positive linear correlation between sardine larvae and sardine catch. \n", "\n", "\n", "### Lagged Correlation and Analysis\n", "\n", "Now what if we want to see if there is any connection between fish larvae and them growing up to be caught in the future? We can visualize this through a lagged correlation. According to [NOAA] (https://www.fisheries.noaa.gov/species/pacific-sardine#:~:text=They%20reproduce%20at%20age%201,hatch%20in%20about%203%20days), it takes about 1-2 years, depending on the factors, for the pacific sardine to mature and become able to reproduce. Thus, we can set back the catch lbs data by 1 year to account for the time it takes for the sardine larvae to reach adulthood. We chose 1 year as our parameter due to this [article] (http://calcofi.org/~calcofi/publications/calcofireports/v37/Vol_37_Butler_etal.pdf) which explains how after the first population collapse of the 1940s, most pacific sardine generally were able to reproduce at age 1, which some individuals being able to do so even earlier. Then we can plot and visualize our results as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "X = sardine_data2['CatchLbs'].values.reshape(-1,1)\n", "Y = sardine_data2['Count'].values.reshape(-1,1)\n", "linear_regressor = LinearRegression()\n", "linear_regressor.fit(X, Y)\n", "Y_pred = linear_regressor.predict(X)\n", "Y = np.array(Y).reshape(-1,)\n", "X = np.array(X).reshape(-1,)\n", "\n", "fig = px.scatter(sardine_data2, x='CatchLbs', y='Count', trendline=\"ols\", labels = {\"Count\" : 'Sardine Abundance'}, title = 'Lagged Correlation for Sardine larve vs Sardine Catch')\n", "fig.show()\n", "print(\"Pearson Corerelation:\", stats.pearsonr(X, Y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introducing Cross Correlation\n", "\n", "We sought to investigate this lagged relationship in more detail. The tool we ended up settling on is called *cross correlation*. Cross correlation is similar to regular correlation, in the sense that it measures a relationship between two variables. However, cross correlation has the additional parameter of time. In other words, it allows one of the variables to be \"offset\" to see if there is a relationship when the times aren't perfectly allowed.\n", "\n", "We thought this would be perfect for our data - if we saw a large increase in larval abundance for a particular year, for example, we would expect to see a larger fishery catch some time later, which would result in a larger correlation for that specific year offset. Check out what we discovered by hitting the play button below!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide_input" ] }, "outputs": [], "source": [ "COMMON_TO_SCIENTIFIC = {\"Anchovy, northern\": 'Engraulis.mordax', \n", "\"Mackerel, jack\": 'Trachurus.symmetricus', \n", "\"Mackerel, Pacific\": 'Scomber.japonicus',\n", "\"Opah\": 'Lampridiformes1',\n", "\"Sardine, Pacific\": 'Sardinops.sagax',\n", "# \"Yellowtail\": 'Seriola.lalandi' # Not enough datapoints for yellowtail\n", "}\n", "\n", "\n", "\n", "def group_by_year(scientificName, commonName, treat_NaN_as_zeros = False):\n", " \"\"\"\n", " Returns a DataFrame with columns [Year, Fishery, Larva] where each column is thesum of pounds caught for the species in that year.\n", "\n", " scientificName: Scientific name of the target fish (for larval data)\n", " commonName: Common name of the target fish (for fishery data) \n", " treat_NaN_as_zeros: setting it to True will treat a missing larval catch for a certain year to 0. Setting it to False will ignore the whole year (False default)\n", " \"\"\"\n", "\n", " # Find all catches with the current species\n", " catches_with_species = cleaned_fishery[cleaned_fishery['Species Name'] == commonName] \n", "\n", " # interpret them as floats and group by sums of pounds per year\n", " catches_with_species.loc[:,'Year'] = catches_with_species['Year'].astype(float)\n", " catches_with_species.loc[:,'Pounds'] = catches_with_species['Pounds'].astype(float)\n", " catches_with_species = catches_with_species.groupby('Year').sum()\n", "\n", "\n", " # find all larval catches with the sums of pounds per year and sum them\n", " caught_species_larva = larva_orig[larva_orig[scientificName] != 0]\n", " caught_species_larva = caught_species_larva[caught_species_larva['year'] > 1980]\n", " caught_species_larva = caught_species_larva.groupby('year').sum()\n", "\n", " larva_array = caught_species_larva[scientificName]\n", " result = []\n", "\n", " for year, caught_pounds in catches_with_species['Pounds'].iteritems():\n", " try:\n", " result.append(np.array([year, caught_pounds, larva_array[int(year)]])) # this will fail if there were no larval catches for this species for the year!\n", " except:\n", " if treat_NaN_as_zeros:\n", " result.append(np.array([year, caught_pounds, 0])) # if we want to treat a nonexistant larval catch as 0\n", "\n", " # Result will be a 2 dimensional np array where the first column is a year, so construct a dataframe from it\n", " return pd.DataFrame(data = np.array(result), columns=[\"Year\", \"Fishery\", \"Larva\"])\n", "\n", "def local_correlation(df):\n", " \"\"\"\n", " Returns a pearson correlation between the Fishery and Larva columns of the dataframe passed in\n", " \"\"\"\n", " return scipy.stats.pearsonr(df['Fishery'], df['Larva'])\n", "\n", "\n", "\n", "def offset_larva_catch(scientificName, commonName, offset, treat_NaN_as_zeros = False):\n", " \"\"\"\n", " Returns a modified version of the dataset where the Fishery Catches are shifted later by the offset. For example, if a certain fish had n catches in 2008, and offset is 2, \n", " the returned dataset would have n in 2010. This is useful in calculating correlation with offset year.\n", "\n", " scientificName: Scientific name of the target fish (for larval data)\n", " commonName: Common name of the target fish (for fishery data) \n", " offset: the amount of years to shift fishery catches by. If offset is negative, the larva data will be set back.\n", " treat_NaN_as_zeros: setting it to True will treat a missing larval catch for a certain year to 0. Setting it to False will ignore the whole year (False default)\n", " \"\"\"\n", " orig_dataset = group_by_year(scientificName, commonName, treat_NaN_as_zeros).to_numpy() # convert the dataset to numpy for easier indexing\n", " result = []\n", " if offset > 0:\n", " for i in range(len(orig_dataset) - abs(offset)):\n", " result.append(np.array([orig_dataset[i+offset][0], orig_dataset[i+offset][1], orig_dataset[i][2]])) #append the row with the offset fishery\n", " else:\n", " for i in range(len(orig_dataset) - abs(offset)):\n", " result.append(np.array([orig_dataset[i][0], orig_dataset[i][1], orig_dataset[i-offset][2]])) #if negative, append the row with the offset larva backwards\n", " if not result:\n", " return pd.DataFrame(columns=[\"Year\", \"Fishery\", \"Larva\"])\n", " return pd.DataFrame(data = np.array(result), columns=[\"Year\", \"Fishery\", \"Larva\"]) #convert back to dataframe and return it\n", "\n", " \n", "\n", "\n", "larva_orig = pd.read_csv('data/Fishlarvaldata_Capstone_2021_FromAndrewThompson_updated 1804 1904 1507 1607 1601 1704 1604 1501 1407 1311 ichthyoplankton by line and station.csv')\n", "fishery_updated = pd.read_csv('data/2232021_SummaryByQuarter_blockgrouping_87-20_210223_Redacted.csv')\n", "\n", "# clean data \n", "cleaned_fishery = fishery_updated.dropna(how='any')\n", "cleaned_fishery = cleaned_fishery[cleaned_fishery['Total Price'] != ' ']\n", "\n", "curr_list = [] #correlations per species\n", "scientific = 'Sardinops.sagax'\n", "common = 'Sardine, Pacific'\n", "for year in range(-3, 8):\n", " offset_df = offset_larva_catch(scientific, common, year) #find the offset of the current year\n", " if len(offset_df) < 2:\n", " continue\n", " corr = local_correlation(offset_df) # find the correlation with the current fish and offset\n", " curr_list.append(np.array([year, corr[0]]))\n", "curr_list = np.array(curr_list)\n", "\n", "\n", "\n", "\n", "frames = []\n", "for i in range(1, len(curr_list)):\n", " frames.append(go.Frame(data=[go.Scatter(x=curr_list[:i, 0], y=curr_list[:i, 1])]))\n", "\n", "fig = go.Figure(\n", " data=[go.Scatter(x=curr_list[:1, 0], y=curr_list[:1, 1])],\n", " layout=go.Layout(\n", " xaxis=dict(range=[-3, 6], autorange=False),\n", " yaxis=dict(range=[0, 1], autorange=False),\n", " title=\"Correlation vs Years Offset (Sardines)\",\n", " xaxis_title = \"Years Offset (Fishery Catch year - Larval Catch Year)\",\n", " yaxis_title = \"Pearson Correlation\",\n", " updatemenus=[dict(\n", " type=\"buttons\",\n", " buttons=[dict(label=\"Play\",\n", " method=\"animate\",\n", " args=[None])])]\n", " ),\n", " frames=frames\n", ")\n", "\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, we realized that correlations peaked with an offset of several years between the larvae and fishery catch date. This appears to support our hypothesis that a high larva catch causes a high fishery catch several years later.\n", "\n", "If this is further investigated, it could have very exciting impacts, both environmental and economical. Being able to accurately predict fishery catches in the future can allow humans to more quickly respond to climate threats, and allow companies to make more intelligent economic decisions based on how abundant the catch will be!" ] } ], "metadata": { "language_info": {}, "orig_nbformat": 3 }, "nbformat": 4, "nbformat_minor": 2 }