[R] adding additional information to histogram

Raphael Bauduin rblists at gmail.com
Fri Jan 27 12:07:02 CET 2012


On Fri, Jan 27, 2012 at 9:51 AM, Jim Lemon <jim at bitwrit.com.au> wrote:
> On 01/27/2012 03:12 AM, Raphael Bauduin wrote:
>>
>> Hi,
>>
>> I am a beginner with R, and I think the answer to my question will
>> seem obvious, but after searching and trying without success I've
>> decided to post to the list.
>>
>> I am working with data loaded from a csv filewith these fields:
>>   order_id, item_value
>> As an order can have multiple items, an order_id may be present
>> multiple times in the CSV.
>>
>> I managed to compute the total value  and the number of items for each
>> order:
>>
>>   oli<- read.csv("/tmp/order_line_items_data.csv", header=TRUE)
>>   orders_values<- tapply(oli[[2]], oli[[1]], sum)
>>   items_per_order<- tapply(oli[[2]], oli[[1]], length)
>>
>> I then can display the histogram of the order values:
>>
>>   hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)
>>
>> Now on this histogram, I would like to display the average number of
>> items of the orders in each group (defined with the breaks).
>> So for the bar of orders with value 0 to 10, I'd like to display the
>> average number of items of these orders.
>>
> Hi Raph,
> As this looks a tiny bit like homework, I'll only provide suggestions. You

This is absolutely not a homework :-)
I'm learning R to try to get some info out of data of a e-commerce website.


> have the value and number of items for each order. What you need to do is to
> match them in groups. In order to do that, you want a factor that will show
> the group for each value-items pair. The "cut" function will give you such a
> factor, using the breaks above. You seem to understand the *apply functions,
> so you can use one of these to return the mean number of items for each
> value group. Alternatively, you could use the factor in the "by" function to
> get the mean number of items.
>
> You should now have a factor that can be sent to "table" to get the number
> of orders in each value range, and a vector of the corresponding mean
> numbers of items in each value grouping. Why you could even use the same
> trick to calculate the mean price of the orders in each value grouping...
>
> I would use "barplot" to display all this information, as it is a bit easier
> to place the mean number on items on the bars (if you check the return value
> for barplot).
>

Your suggestions helped me get the info I wanted. I still need to
finetune it as I currently generate 2 barplots.
Here's what I've done, in case it can help someone in the future:

#assigns to each entry of orders_values, the range to which is belongs
according to the breaks passed in second arg.
order_value_range<-cut(orders_values, c(10*0:20, 800))
#count number of orders in each range:
orders_number_per_range=tapply(orders_values, order_value_range, length)
#equivalent to table(test_o)

average_number_of_item_per_order_in_range <- tapply(items_per_order,
order_value_range, mean)

barplot(average_number_of_item_per_order_in_range, ylab="Items
number", xlab="Order value")
barplot(orders_number_per_range, ylab="Items number", xlab="Order value")


The next step: combine the two barplots in one.

Thanks already for your help!


Raph

> Jim
>



-- 
Web database: http://www.myowndb.com
Free Software Developers Meeting: http://www.fosdem.org



More information about the R-help mailing list