Example usage
To use okridge in a project:
Import Necessary packages
import os
import numpy as np
from okridge.tree import BNBTree
from okridge.utils import download_file_from_google_drive
from pathlib import Path
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 2
1 import os
----> 2 import numpy as np
3 from okridge.tree import BNBTree
4 from okridge.utils import download_file_from_google_drive
ModuleNotFoundError: No module named 'numpy'
Download Sample Synthetic Data
data_file_path = "../tests/Synthetic_n=6000_p=3000_k=10_rho=0.5_snr=5.0_seed=0.npy"
if not os.path.isfile(data_file_path):
download_file_from_google_drive('1lizlnufRBmEzMNpr0OlgE-P7otC8opkX', data_file_path)
loaded_data = np.load(data_file_path, allow_pickle=True)
X, y = loaded_data.item().get("X"), loaded_data.item().get("y")
print("Shape of feature matrix is", X.shape)
print("There are {} number of samples".format(len(y)))
Shape of feature matrix is (6000, 3000)
There are 6000 number of samples
Apply OKRidge Software
k = 10 # cardinality constraint
lambda2 = 0.1 # l2 regularization parameter
gap_tol = 1e-4 # optimality gap tolerance
verbose = True # print out the progress
time_limit = 180 # time limit in seconds
BnB_optimizer = BNBTree(X=X, y=y, lambda2=lambda2)
upper_bound, betas, optimality_gap, max_lower_bound, running_time = BnB_optimizer.solve(k = k, gap_tol = gap_tol, verbose = verbose, time_limit = time_limit)
Using max memory (300 GB)
using breadth-first search
'l' -> level(depth) of BnB tree, 'd' -> best dual bound, 'u' -> best upper(primal) bound, 'g' -> optimiality gap, 't' -> time
l: 0, d: -659444.6478942606, u: -657798.0645024217, g: 0.0025031746, t: 2.66743 s
l: 1, d: -659387.8270570670, u: -657798.0645024217, g: 0.0024167942, t: 5.36963 s
l: 2, d: -659317.8396916100, u: -657798.0645024217, g: 0.0023103978, t: 8.06236 s
l: 3, d: -659250.2763933846, u: -657798.0645024217, g: 0.0022076865, t: 10.70843 s
l: 4, d: -659169.0430832551, u: -657798.0645024217, g: 0.0020841937, t: 13.27604 s
l: 5, d: -659066.9599284586, u: -657798.0645024217, g: 0.0019290045, t: 15.77262 s
l: 6, d: -658941.7627230188, u: -657798.0645024217, g: 0.0017386768, t: 18.27392 s
l: 7, d: -658798.4741379283, u: -657798.0645024217, g: 0.0015208461, t: 20.69671 s
l: 8, d: -658604.8032165651, u: -657798.0645024217, g: 0.0012264231, t: 23.12765 s
l: 9, d: -658331.2630668270, u: -657798.0645024217, g: 0.0008105809, t: 25.51869 s
print("Loss of best solution is", upper_bound)
print("Best lower bound is", max_lower_bound)
print("indices of nonzero coefficients are", np.where(betas != 0)[0])
print("Optimality gap is {}%".format(optimality_gap * 100))
print("Running time is {} seconds".format(running_time))
Loss of best solution is -657798.0645024217
Best lower bound is -657798.0645024217
indices of nonzero coefficients are [ 0 300 600 900 1200 1500 1800 2100 2400 2700]
Optimality gap is 0.0%
Running time is 27.08906078338623 seconds