Amazon Sagemaker improves the flexibility necessary to respond to complex business problems in machine learning workloads.With the built -in algorithm, you can start immediately.This blog post summarizes how to expand the embedded factor -decomposition algorithm and predict the top of the premium dating.
This method is ideal when a certain number of recommendations are generated by batch processing to the user.For example, this method can generate the top 20 products that a user may buy from a large amount of information about users and product purchases.After that, this recommend that can be used in the database in the database to display it on the dashboard or personalized email marketing in the future.You can use AWS Batch or AWS Step Functions to automate the procedure for outlines on this blog and perform regular re -training and predictions.
Cause -solved machines are a general -purpose teacher and can be used for both classification and regression tasks.This algorithm was designed as an engine of the recommendation system.In this algorithm, the coordination filtering method is expanded by learning secondary functions about the secondary coefficient to a low rank structure.This restriction is well suitable for large data because it avoides over learning and is very scalable.This will make millions of parameters on the problem of a common recommendation with millions of input features.
The model equation of the factor -disassembly machine is defined as follows.
The following model parameters are estimated.
Here, n is input size, and K is a potential space size.Use these estimated model parameters to extend the model.
By using Amazon Sagemaker's factor -decomposed algorithm, you can predict the score of the pair, based on these pairs, such as users and items.When applying a recommendation model, you often want to return a top -level item list that matches the user's preference.If you don't have a large number of items, you can query users and items for all possible items.However, this method cannot scale well if the number of items increases.In this scenario, you can use the Amazon Sagemaker K neighborhood (K-NN) algorithm to speed up the prediction tasks in the top x case.
The figure below shows a general outline of the procedure handled in this blog post.This includes the construction of a multi-dissolving machine, the re-package of the model data, the fitting of the K-NN model, and the forecast of the top x case.
You can also download a guupyter notebook that will be handed down to proceed.The following sections correspond to this notebook section, so you can execute the code for each step while reading.
Check the construction procedure of the factor -dismissal model in Part 1 of the Jupyter notebook that is guilty.For more information about the construction of a multi -resolution model, see the documentation of the multiplier algorithm.
The Amazon Sagemaker factor -solving algorithm uses apache mxnet deep learning frameworks.This section handles how to use Mxnet to re -package model data.
First, download the factor -decomposed model, decompress and create an MXNet object.The main purpose of the Mxnet object is to extract model data.
#Download FM model os.system('aws s3 cp '+{Model location} + './')#Extract files from the model.Note: the companion notebook outlines the extraction steps.
因数分解機モデルには、ベクトルのリスト xu+ xi を入力します。これは、映画のユーザー評価など、ラベルで結びつけられたユーザー u と項目 i を表します。結果として生じる入力行列には、ユーザー、項目、ならびに追加する必要があるその他の特徴についての、ホットエンコードされた疎な値が含まれます。
The output of the factor -disassemble model is composed of three N -dimensional arrays.
Complete the following steps and extract the model that outputs the Mxnet object.
#Extract model datam = mx.module.Module.load('./model', 0, False, label_names=['out_label'])V = m._arg_params['v'].asnumpy()w = m._arg_params['w1_weight'].asnumpy()b = m._arg_params['w0_weight'].asnumpy()
The model data extracted from the multiplier model has been re-packaged to build a K-NN model.This process creates the following two datasets:
nb_users = nb_movies = # item latent matrix - concat(V[i], w[i]). knn_item_matrix = np.concatenate((V[nb_users:], w[nb_users:]), axis=1)knn_train_label = np.arange(1,nb_movies+1)#user latent matrix - concat (V[u], 1) ones = np.ones(nb_users).reshape((nb_users, 1))knn_user_matrix = np.concatenate((V[:nb_users], ones), axis=1)
Uploading and saving the K-NN model input data to Amazon S3, creating and saving K-NN models so that the K-NN model can be used with Amazon Sagemaker.This model is also useful for calling a batch conversion, as described in the following steps.
The K-NN model is the default index_type (Faiss).Use Flat).This model is accurate, but in the case of a large amount of datasets, it may be slow.In such a case, you can use different INDEX_Type parameters for an approximate faster response.For more information about the indicated type, see K-NN Documentation or the Amazon Sagemaker Example Notebook.
#upload dataknn_train_data_path = writeDatasetToProtobuf(knn_item_matrix, bucket, knn_prefix, train_key, "dense", knn_train_label)# set up the estimatornb_recommendations = 100knn = sagemaker.estimator.Estimator(get_image_uri(boto3.Session().region_name, "knn"), get_execution_role(), train_instance_count=1, train_instance_type=instance_type, output_path=knn_output_prefix, sagemaker_session=sagemaker.Session())#set up hyperparametersknn.set_hyperparameters(feature_dim=knn_item_matrix.shape[1], k=nb_recommendations, index_metric="INNER_PRODUCT", predictor_type='classifier', sample_size=nb_movies)fit_input = {'train': knn_train_data_path}knn.fit(fit_input)knn_model_name =knn.latest_training_job.job_nameprint "created model: ", knn_model_name# save the model so that you can reference it in the next step during batch inferencesm = boto3.client(service_name='sagemaker')primary_container = { 'Image': knn.image_name, 'ModelDataUrl': knn.model_data,}knn_model = sm.create_model(ModelName = knn.latest_training_job.job_name,ExecutionRoleArn = knn.role,PrimaryContainer = primary_container)
A large -scale batch prediction is possible with the Amazon Sagemaker batch conversion function.As an example, we start by uploading the user estimated input data to Amazon S3.Then trigger the batch conversion.
#upload inference data to S3knn_batch_data_path = writeDatasetToProtobuf(knn_user_matrix, bucket, knn_prefix, train_key, "dense")print "Batch inference data path: ",knn_batch_data_path# Initialize the transformer objecttransformer =sagemaker.transformer.Transformer( base_transform_job_name="knn", model_name=knn_model_name, instance_count=1, instance_type=instance_type, output_path=knn_output_prefix, accept="application/jsonlines; verbose=true")# Start a transform job:transformer.transform(knn_batch_data_path, content_type='application/x-recordio-protobuf')transformer.wait()# Download output file from s3s3_client.download_file(bucket, inference_output_file, results_file_name)
The output file has predicted for all users.The item of each line of the output file is a JSON line, and the ID and distance of the specific user are described.
The following is an example of user output.You can save it in the database to make more use of the recommended movie ID.
Recommended movie IDs for user #1 : [509, 1007, 96, 210, 208, 505, 268, 429, 182, 189, 57, 132, 482, 165, 615, 527, 196, 269, 528, 83, 176, 166, 194, 520, 661, 246, 180, 659, 496, 173, 9, 435, 474, 192, 493, 48, 211, 656, 489, 181, 251, 124, 89, 510, 22, 183, 316, 185, 197, 23, 170, 168, 963, 190, 1039, 56, 79, 136, 519, 651, 484, 275, 654, 641, 523, 478, 302, 223, 313, 187, 1142, 134, 100, 498, 272, 285, 191, 515, 408, 178, 199, 114, 480, 603, 172, 169, 174, 427, 513, 657, 318, 357, 511, 12, 50, 127, 479, 98, 64, 483]Movie distances for user #1 : [1.8703, 1.8852, 1.8933, 1.905, 1.9166, 1.9185, 1.9206, 1.9239, 1.928, 1.9304, 1.9411, 1.9452, 1.947, 1.9528, 1.963, 1.975, 1.9985, 2.0117, 2.0205, 2.0211, 2.0227, 2.0583, 2.0959, 2.0986, 2.1064, 2.1126, 2.1157, 2.119, 2.1208, 2.124, 2.1349, 2.1356, 2.1413, 2.1423, 2.1521, 2.1577, 2.1618, 2.176, 2.1819, 2.1879, 2.1925, 2.2463, 2.2565, 2.2654, 2.2979, 2.3289, 2.3366, 2.3398, 2.3617, 2.3654, 2.3855, 2.386, 2.3867, 2.4198, 2.4431, 2.46, 2.462, 2.4643, 2.4729, 2.4959, 2.5334, 2.5359, 2.5362, 2.542, 2.5428, 2.5934, 2.5953, 2.598, 2.6575, 2.6735, 2.6879, 2.7038, 2.7259, 2.7432, 2.8112, 2.8707, 2.871, 2.9378, 2.9728, 3.0175, 3.0231, 3.0254, 3.0259, 3.0325, 3.0414, 3.1033, 3.2729, 3.3406, 3.392, 3.3982, 3.4196, 3.4452, 3.4684, 3.4743, 3.6265, 3.7013, 3.7711, 3.7736, 3.8898, 4.0698]
The framework of this blog applies to scenarios using users and item IDs.However, data may include additional information, such as users and items.For example, you may know the age, postal code, and gender of the user.You may know the important keywords in categories, movie genres, and textbooks for items.For these multiple functions and category categoricos, the user and item vector are extracted using the following:
それから、aiを使用して、推定するために k-NN モデルと auを構築します。
Amazon Sagemaker gives developers and data scientists the flexibility to quickly build, train, and deploy machine learning models.By using the above -mentioned framework, you can build a recommended system for predicting the top x case for users in the batch method, and can cache the output to the database.In some cases, you may need to filter more for the prediction.In other words, it may be necessary to filter and remove several predictions at any time based on the user response.This framework is highly flexible, so you can fix it according to such a use case.
Zohar Karnin is the chief scientist of Amazon AI.His target research is a large -scale online machine learning algorithm.He is developing an unlimitedly expanding machine learning algorithm for Amazon Sagemaker.
RAMA THAMMAN is a senior solution architect, a member of the strategic account team.He is collaborating with customers to build a scalable cloud and his AWS machine learning solution.