[R] R and Supervised learning

Luca Meyer lucam1968 at gmail.com
Mon Oct 2 13:37:04 CEST 2017


Hi,

I am currently find myself selecting manually amoungts several hundreds
Google Alerts (GA) texts those that are indeed relevant for my research vs
those which are not (despite they are triggered by some relevant seach
keywords).

Basically each week I get several hundreds GA email such as:

https://www.dropbox.com/s/u7rp0ez1tamq001/Alerte%20Google%C2%A0-%20laitier%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0

and

https://www.dropbox.com/s/1ubx5enw6tc90hj/Google%20Alert%20-%20latte%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0

>From such emails I create a file such as:

https://www.dropbox.com/s/y5yqcsxp1zcmnhc/test_sample.xlsx?dl=0

And this is really becoming a time consuming procedure, hence my decision
to try appling artificial intelligence solutions to such a case.

What I would really need are 2 separate steps:

(1) A procedure that reads the GA email and creates a file such as the
excel I have shared here (only first 3 columns)

(2) Some sort of supervised learning algorithm that can learn by example
from my choices and decide on my behalf (see column 4 in the attached
file). That is: taking the output from step (1) above I can classify a few
hundreds cases and then let the algorithm learn and classify
future/additional data. I plan to regularly review such a classification,
correct missclassifications and train the algorithm again with the
objective to improve its ability to correctly classify the GA texts.

Is my explanation clear enought? Can all the above be done within R? If so,
is there any package/procedure I should be using?

Thank you in advance for any suggestion you might have.

Luca

	[[alternative HTML version deleted]]



More information about the R-help mailing list