We want to calculate the similarity of several thousands of texts. The number of texts can go upto 100 K. Each text is in 1 .txt file and each file has a number: [login to view URL], [login to view URL], etc.
After that, we want to extract the less similar texts. With 2 options: extract the x less similar texts or extract all the text with a maximum similarity of n %.
A table must be generated, indicating the number of texts we can extract with a maximum similarity ratio of x %, with x going from 0 to 100, by increments of 1.
The tool must be running on demand on HPC.
We are opened to hire several people to achieve this goal if it's necessary: a mathematician to write the calculation algorithm, a computational linguist and someone experimented with HPC.
5 freelance font une offre moyenne de $477 pour ce travail
This is a default bid made. we'll discuss the price later in the chat after reading your project. I can do this for you perfectly. I still have a few questions. please leave a message on my chat so we can discuss this Plus
Wonderful Project! I saw your description of the project in detail and I am very interested in your project. I 'm a master of mathematics and I have lots of experiences in using matlab. So I can do your project in time Plus
What is the systems(OS) of HPC. linux or windows？ I am very interested in your program, and i though this task can not be done by matlab. I can finish the task in time, and provide late modification service within a mo Plus