The increase in the amount of information produced by existing data sources requires the efficient processing of data flows in real time. Due to the large volume of data, applications that need to process this information to make informed decisions and to detect situations of interest usually impose strong requirements on resources.

This website refers to our work "Trading Accuracy for Performance in Data Processing Applications" [1], which proposes approximate solutions to process large amounts of information flows in order to reduce the amount of data to be processed and estimate the accuracy of this solutions based on the type of queries and the distribution of data. The accuracy is defined in terms of the precision and recall of the results obtained.

The implementation prototype as well as all artifacts of the experiments and charts resulting from these experiments are available on our Github repository [2].

Amazon Case Study

For the case study, let us consider a simplified version of Amazon ordering service to identify the following situations of interest:

  • CreateAdCampaign: create a link isPublicized between an advertising campaing and a product if the product has been ordered more than 1000 times during the advertising campaign period.
  • UnpopularStock: it returns all products that have been ordered by less than 3 customers last month.
  • RelatedProducts: creates a link isRelatedTo between two products that have been ordered last month in the same order at least 100 times.
  • OlympicGamesTrending: creates a relationship isPublicized between the ad campaign 'Oympic Games' and the products that have been ordered at least 100 times in Rio de Janeiro since the beginning of August 2016 until the end of the celebration of the event.
  • RecommendsPack: if a customer has ordered Product1 at least 5 times in different orders in the last month and this product is related to Product2 (isRelated connection), then an offer for Product2 is created for the customer. Such an offer has a priority of 1-highest priority. If Product1 is related to Product3 indirectly-i.e., through an intermediate product: Product1 is related to ProductX, which is related to Product3-, then an offer for Product3 with priority 2 is created for the customer. In this case, we say that Product1 is related with Product3 in two hops. Similarly, if Product1 is related to ProductN in n hops, the query would create an offer with priority n. In this query, we consider offers from priority 1 to 3.


[1] Gala Barquero, Javier Troya, Antonio Vallecillo: Trading Accuracy for Performance in Data Processing Applications. Submitted.
[2] Approximate Transformation Git repository, 2019. https://github.com/atenearesearchgroup/approximateTransformation. Accessed: March 2019.