You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are working on a new feature to use LineaPy for parameter refactoring.
A Jupyter Notebook user(might be a data scientist) has developed some processes that take some literal values(any parameters) and produce some results(artifacts). Once they make their notebooks work end-to-end in one or few examples, they want to generalize their process and apply it to the entire data set. This is essentially the process of wrapping a piece of code into a for loop(or nested for loops).
This process is very repetitive and error-prone when we want to add new parameters or make adjustments to the process since it involves a lot of copy-paste and nonlinear cell executions.
Here are some examples:
If we are data scientists working on some NLP problems, we first do the feature engineering in one or few text files, and we want to extend the process to all our corpus. In this case, we want to parameterize the file path so it can take all our corpus.
If we are ML engineers optimizing a scikit-learn random forest classifier. In this case, all parameters in sklearn.ensemble.RandomForestClassifier might be our parameters.
If we are data engineers who develop some web scrapping, we first ensure our script is working on one page; then, we apply our script to multiple pages. In this case, our parameter is the URL.
If we are data analysts in a retailer who develop a dashboard to display daily sale results of each store. We first make the report run on a day at a given store, then extend it to every day and every store. In this case, date and store id are our parameters.
We think LineaPy can simplify this process in the following way.
Proposed API
A new API that allows users to determine input parameters and output variables
ft=lineapy.get_function(
output_objects= ['artifact_1'],
input_parameter= ['var1', 'var2'],
)
# Apply the function to a new set of parametersnew_result=ft(var1='new_var1', var2='new_var2')
# Apply the function to a parameter gridnew_result= {}
forvar1inlist_of_new_var1s:
forvar2inlist_of_new_var2s:
new_result[var1, var2] =ft(var1, var2)
This is the proposed API, not the final implementation
For any suggestions or feature requests, please respond to this discussion.
Example Usage
Let's use pairs trade as a more concrete example.
A pairs trade is a trading strategy that involves long and short positions in two stocks with high correlation.
Here is a code example to calculate the percentage return correlation between MSFT and AAPL in the first twenty trading days of 2017.
Depending on your use case, you might want to parameterize the above code in different ways.
If you want to compare the correlation with MSFT with other stocks within the same period in the above examples, we can run:
correlation_with_MSFT=lineapy.get_function(['correlation'], input_parameters=['ticker2'])
# Compare correlation between MSFT and AMZNcorrelation_with_MSFT('AMZN')
# Compare correlation between MSFT and TSLAcorrelation_with_MSFT('TSLA')
If you want to compare the correlation between arbitrary two stocks(and keep the underlying data), start from any given date, and last for an arbitrary length, we can run:
two_stock_correlations=lineapy.get_function(
['history1','history2','correlation'],
input_parameters=['ticker1','ticker2','start','n_day']
)
# Compare correlation between SPY and QQQ, starting from 2010-01-01 and last 100 days.two_stock_correlations('SPY','QQQ','2010-01-01',100)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Parameter Refactoring
We are working on a new feature to use LineaPy for parameter refactoring.
A Jupyter Notebook user(might be a data scientist) has developed some processes that take some literal values(any parameters) and produce some results(artifacts). Once they make their notebooks work end-to-end in one or few examples, they want to generalize their process and apply it to the entire data set. This is essentially the process of wrapping a piece of code into a for loop(or nested for loops).
This process is very repetitive and error-prone when we want to add new parameters or make adjustments to the process since it involves a lot of copy-paste and nonlinear cell executions.
Here are some examples:
scikit-learn
random forest classifier. In this case, all parameters insklearn.ensemble.RandomForestClassifier
might be our parameters.We think LineaPy can simplify this process in the following way.
Proposed API
This is the proposed API, not the final implementation
For any suggestions or feature requests, please respond to this discussion.
Example Usage
Let's use pairs trade as a more concrete example.
A pairs trade is a trading strategy that involves long and short positions in two stocks with high correlation.
Here is a code example to calculate the percentage return correlation between
MSFT
andAAPL
in the first twenty trading days of 2017.Depending on your use case, you might want to parameterize the above code in different ways.
MSFT
with other stocks within the same period in the above examples, we can run:Beta Was this translation helpful? Give feedback.
All reactions