More

Algorithm to determinate neighbor's position (N S E W) of parcels?

Algorithm to determinate neighbor's position (N S E W) of parcels?


After my first publicaation Determinate Parcel Neighbors using PYTHON Where i was looking for a faster way to calculate parcel neighbors and position, and after succeeding that, but the result was not full and reliable at 100%, view of complexity of your shapes, so Mr. Richard Fairhurst proposed me to start a new tread and search for a new algorith more efficient and giving more complete and reliable results !

So me i suggest as algorithm to calculate centroide of each polygon, then make a multiple projections of this point on the intersection between two polygons (target and neighbor polygon)

After calculate a bearing of each line, then associate bearings to the compass points are [45-135] = North, [135-225] = West, [225-315] = South, [315-359.9… , 0-45] = East…

So My Question are :

  • what do you think about the algorithm is it strong or can stack…
  • Do you have any way or idea how to make a projections explained before ?

for the rest i made it before as you can see on the last publication mentionned before !

I'am using Arcgis 10.1, and Python language.


For the example shown, polygon 185P1 would obviously qualify as South of polygon 187. Hard to tell if it would also qualify as East, but the bearings from those lines seem like they would provide enough information to determine whether or not the neighbor should also be considered East. Clearly polygon 185P would not be considered North or West of polygon 187 is you tested the bearings of the lines shown, so that seems to fit what you want. In any case, for the example it seems that testing those bearings would provide the best answer for determining the set of relative compass directions that describe the relationship of polygon 187 to polygon 185P1. So this method seems sound.

In your previous post you indicated that you wanted to include polygons that did not directly touch as neighbors if there was no polygon between them. Is that still true or not? Is that what you mean by "Do you have any way or idea how to make a projections explained before?"?

Do you want polygon 185P1 to consider 185P2 to be a neighbor or not? They seem close enough together to use a buffering tolerance as a test. Do you want polygon 187 to consider polygon T18188/44 a neighbor or not? They are not very close and would create more challenges to evaluate based on a buffer tolerance. If you want these polygon pairs to be treated as neighbors than the challenges would be how to first identify the set of polygons to test for this condition and then how to identify which points to use as representing the neighboring shared edge that are not blocked by another polygon.


Computing the Ising Model for NiO

I am trying to compute the Ising model for NiO. As O carries no magnetic moment, I only need to consider the case of Ni which requires a second nearest neighbour Ising model. As can be seen in the figure below, the Ni atoms interact with the their nearest neighbours with a coupling constant J1 = 2.3 meV and their second nearest neighbours with a coupling constant J2 = -21 meV.

I have created some code that generates a matrix that alternates 1 and -1 (spin up/spin down) in every second entry and 0 for every other entry (representing oxygen). I have also defined functions that will flip the spin for every nearest neighbour and second nearest neighbour. As the dominating coupling constant J2 < 0, the system should be antiferromagnetic so the spins should align diagonally repeating the pattern (1,0,-1,0) e.g:

However when I run the code, I am not able to achieve that. I can reach a certain amount of order at low temperatures (T

2) but not total ferromagnetism as can be seen below. Going lower (e.g. T

Any help would be greatly appreciated.


Abstract

In this paper, the single-vehicle static repositioning problem is studied. The objective of repositioning is to minimize the weighted sum of unmet customer demand and operational time on the vehicle route. To solve this problem, chemical reaction optimization (CRO) is proposed to handle the vehicle routes, and a subroutine is proposed to determine the loading and unloading quantities at each visited station. An enhanced version of CRO is proposed to improve the solution quality of the original CRO by adding new operators, rules, and intensive neighbor solution search methods. The concept of a neighbor-node set is proposed to narrow the solution search space. To illustrate the efficiency and accuracy of the enhanced CRO, different test scenarios are set and the results obtained from IBM ILOG CPLEX, the original CRO, and the enhanced CRO are compared. The computational results indicate that the enhanced CRO provides high-quality solutions with shorter computing times than those of IBM ILOG CPLEX and provides better solutions than the original CRO. The results also demonstrate that incorporation of the two neighbor-node sets into the enhanced CRO improves the solution quality, and the probability of running the intensive search should increase with iteration in the final part of the main stage of the algorithm to obtain better solutions.


Jellyfish dynamic routing protocol with mobile sink for location privacy and congestion avoidance in wireless sensor networks

Recently, Wireless Sensor Network (WSN) is often viewed with an oversized range of sensors that are structured and collaborate to gather and transmit information around the targets. As sensors may be positioned in harsh surroundings, it is critical for secure data transmission. Therefore, a dynamic routing path should be essential for WSN applications. In this paper, a Jellyfish Dynamic Routing Protocol (JDRP) for preserving location privacy and congestion avoidance with less delay guaranteed is proposed. With this routing technique, the complete sensor field is divided into different subdivisions and each subdivision elects a target area by computing its transmission distance. The backbone of the dynamic routing protocol consists of a virtual ring called bell nodes and a radial line called tentacle nodes employs more nodes to construct the network. The amount of radial line and radius of the virtual ring in a network are conjointly determined to ease the communication path from the node to sink. In this structure, the radial line paths are routed directionally and bell nodes are routed with angular directions probabilistically. From the routing path, the tentacle nodes collect the data to dynamic sink which will assure that the information is going to be collected with less delay and attacker cannot guess their positions. The experimental results show that the proposed JDRP method accomplishes enhanced performance in terms of energy consumption, packet delivery delay and lifetime.


Results

Characterization of mycobacterial isolates

Several mycobacterial species belonging to the M. avium complex are present in animal surroundings each with different capacities to cause illness (e.g., M. ap, M. avium) and potential to spread to humans (Alvarez-Uria, 2010). Before initiating our genome analysis of members of the M. avium complex, we searched our collection of mycobacterial isolates originating from diverse hosts, diverse tissues as well as from environmental samples of dairy herds that might help in spreading the infection. Our selection scheme identified eight isolates that were subjected for further genotyping protocols to confirm their identity. Based on acid-fast staining and amplification of the 16S rRNA gene using mycobacteria-specific primers (Talaat et al., 1997), all eight isolates were shown to belong to the genus mycobacterium. Moreover, typing based on the hsp65 gene (Smole et al., 2002) confirmed the identity of two mycobacterial isolates, DT 78 and Env 77 as M. avium subspecies avium (M. avium) while the rest of the isolates were all M. ap. Identification of sheep or cattle types of M. ap was based on IS1311 amplification followed by HinfI digestion (data not shown). All of the six M. ap isolates belonged to the bovine origin (M. ap type II). A compiled list of all mycobacterial isolates used in this study and their origin is shown in Table ​ Table1 1 .

Whole-genome sequencing of mycobacterial isolates

The Illumina sequencer generated an average read length of 50 nucleotides with an average coverage of 42�× of each sequenced genome after reference assembly. The number of reads, mapped reads, and the length of consensus sequence are all listed in Table ​ Table2. 2 . The revised version of M. ap K-10 sequence (Wynne et al., 2010) and M. avium subspecies hominissuis (M. avium 104) were used as references for comparative genome assembly of the target isolates. As expected, all examined M. ap genomes showed a high sequence identity (up to 99%) to the M. ap K-10 genome. Lack of sequence coverage in some parts of the genome could explain some of the differences from the reference genome. Despite the presence of small deleted regions among M. ap genomes, only 2 gaps ϡ kb had been seen among M. ap genomes, including the one isolated from human (M. ap 4B isolate), suggesting a high level of similarity to the M. ap K-10 strain isolated from cattle. On the other hand, the M. avium DT 78 strain had only 87% sequence identity to the M. avium 104 genome while it had a higher similarity (93%) to the M. ap K-10 genome, despite its established genotype as M. avium isolate. In the DT 78 genome, more gaps were present whether M. avium 104 or M. ap K-10 were used for reference alignment (Figure ​ (Figure1). 1 ). The average gap size in this genome is

A whole-genome alignment of M. avium DT 78, M. avium 104 and M. ap DT 78. MAUVE algorithm (Darling et al., 2010) was used for the alignment of the three genomes where white areas indicate low coverage gaps in the sequence of M. avium DT 78 genome, and about seven large region Indels were identified in M. avium DT 78. Regions with the same color indicate high similarity and connected by same color bars. The genomes were drawn to scale based on the reference M. avium 104 genome.

Table 2

A summary report for CLC Bio reference assembly of M. avium and M. ap isolates.

ATCC 19698M. ap 4BJTC 1281JTC 1285DT 3Env 210DT 78
Reference organismM. ap K-10M. ap K-10M. ap K-10M. ap K-10M. ap K-10M. ap K-10M. avium 104
Reference length4,832,5894,832,5894,832,5894,832,5894,832,5894,832,5895,475,491
Total read count5,994,3126,729,3964,645,2305,985,9526,374,2426,294,1626,978,706
Matched read count5,417,4596,522,3334,164,7315,391,6746,177,1556,080,4935,637,136
Non-specific match read count a 53,14556,05139,87954,95150,70053,34061,192
Consensus length4,822,3284,815,9854.823.7424,823,1654,815,3764,817,3344,808,427
Homology (%) b 99.7999.6699.8299.8099.6499.6887.82
Average coverage c 55.7768.7142.8755.5065.0764.0551.16

a Non-specific match read counts are those reads that can be matched more than one place in the reference genome and such reads were randomly placed in one of the matched spots.

b Homology percentage was calculated as: consensus length divided by reference length and then multiplies 100.

c Average coverage is the average of all the reads coverage in each area in the consensus sequence.

Among the sequenced genomes, the genome of M. avium Env 77 provided a significant challenge because of the low level of similarity to M. avium 104 genome during the reference assembly phase. Accordingly, we employed an algorithm for de novo assembly that generated 772 contigs. These contigs were used as queries in MegaBLAST search against the Mycobacteria genome database (blast.ncbi.nlm.nih.gov). The coverage of each contig is at least 20× and the average coverage of all contigs is around 30× for this strain. In fact, the Env 77 genome was sequenced twice with similar result for each sequencing run (data not shown). Interestingly, BLAST analysis showed only a third of the Env 77 genome with sequence similarity to the genomes of either the M. ap K-10 or M. avium 104 and to a lesser degree to other sequenced mycobacterial genomes, suggesting a mosaic genome structure (Figure ​ (Figure2). 2 ). Detailed BLAST analysis of the Env 77 draft genome shared common conserved genes, mainly with four mycobacterium species, including ribosomal proteins, DNA polymerase, proteinase Clp, cell division protein Fts, and some transcription or translation regulatory factors. As indicated in Figure ​ Figure2, 2 , the genome of M. avium Env 77 has higher similarity to M. avium 104 and M. ap K-10 than other mycobacterial species. Overall, the sequenced genomes from all strains, except Env 77, mapped to the reference genomes with a significantly high level of similarity. All sequenced genomes were deposited to GenBank database for download and further analysis. The accession numbers for the deposited sequences are listed in Table ​ TableA1 A1 in Appendix.

Genome composition of M. avium Env 77. MegaBLAST algorithm was used to identify closely related bacteria to all contig sequences from the M. avium Env 77 isolate. Genomes with 㰐% homology were excluded from representation. Members of the M. tuberculosis complex included M. tuberculosis and M. bovis with sequence divergence υ%. The same criteria was used to formulate M. avium and M. ap groups.

Genomic rearrangements among M. ap isolates

A major goal of our investigation was to delineate events of insertions and deletions among mycobacterial genomes to better understand their evolutionary relationships. To identify large scale events of insertions/deletions (Indels), we compared the assembled genomes of the six M. ap isolates to the standard M. ap K-10 genome using MAUVE software (version 2.3.1 Darling et al., 2010 Figure ​ Figure3). 3 ). Among the potential Indels that could exist among these genomes, we identified only gaps that are ρ kb. A common gap area located at reference position 3,767,550𠄳,767,870 which is part of MAPK 3350 gene encoding a hypothetical protein has been seen among all six strains with a gap size

300𠂛p. At this region, low or zero read coverage has observed among all six strains suggesting a problematic region for Illumina sequencer. The sequence in this gap region appeared to have high GC contents (82%) but no repetitive elements involved.

Comparative analysis of M. ap and M. avium from animals and environmental sources. The gapped consensus sequence of each strain was used for comparison by MAUVE version 2.3.1. (A) A close-up depiction of a breaking point in the alignment of six M. ap genomes in comparison to M. ap K-10 reference genome. The white areas indicated low or zero reads. In this example, the flanking sequences of the breaking point contain high GC percentage sequence but not repetitive sequences. (B) Indels among M. ap and M. avium genomes. Notice genome rearrangements are usually surrounding the genome origin of replication.

Based on the MAUVE comparison, the consensus sequences of these six strains are closely matched to the M. ap K-10 genome and no inversions were observed (Figure ​ (Figure3). 3 ). On the other hand, when MAUVE was used to compare the genome of M. ap isolates to the M. avium 104 or M. avium DT78 genomes, about seven large regions of Indels were identified, confirming earlier findings by our group when DNA microarray was used (Wu et al., 2006). For example, one 11 kb Indel was found in all six M. ap strains at position 2,318,400𠄲,333,740 (MAPK 2038–MAPK 2050) but absent from M. avium. This 11 kb region encodes mostly hypothetical proteins in M. ap K-10 genome with two exceptions, MAPK 2040 and MAPK 2050. MAPK 2040 is a predicted hydrolase and earlier analysis (Santema et al., 2009) also showed the absence of this gene in M. avium 104, but present in other M. avium strain (Table ​ (Table3). 3 ). In addition, a total of six genomic inversions spanning

2.4 Mb were identified among all M. ap strains when compared to M. avium 104 genome, similar to our earlier analysis of only M. ap K-10 and M. avium 104 genomes (Wu et al., 2006).

Table 3

A list of genes in the 11 kb island which is absent in M. avium 104.

New annotation (Wynne et al., 2010)Old annotation (Li et al., 2005)Length (bp)Function
MAPK 2038MAP 1730c1,023Hypothetical protein
MAPK 2039MAP 1729c828Hypothetical protein
MAPK 2040MAP 1728c723YfnB-hydrolase
MAPK 2041MAP 1727906Hypothetical protein
MAPK 2042MAP 1726c585Hypothetical protein
MAPK 2043MAP 1725c1,029Hypothetical protein
MAPK 2044MAP 1724c558Hypothetical protein
MAPK 2045MAP 1723666Hypothetical protein
MAPK 2046MAP 17221,221Hypothetical protein
MAPK 2047MAP 1721c672Hypothetical protein
MAPK 2048MAP 17201,020Hypothetical protein
MAPK 2049MAP 1719c615Hypothetical protein
MAPK 2050MAP 1718c456MAP specific protein

SNPs among M. ap isolates

To better analyze genomic diversity among M. ap isolates, we also examined genomic variations on the nucleotide level. For SNPs analysis, we set stringent criteria for SNP detection (see Materials and Methods). The total number of SNPs among six M. ap genomes ranged from 56 to 131 (Figure ​ (Figure4), 4 ), among which 17 were found in ϡ genome (Table ​ (Table4). 4 ). The number of non-synonymous SNPs (nSNPs) is slightly higher than synonymous SNPs (sSNPs), suggesting a positive selective pressure on the identified genes. In addition, most genes harbored one SNP with exceptions of 23 genes that contained two or three SNPs (Table ​ (TableA2 A2 in Appendix). Interestingly, GlnE and MAPK 4304 contained three SNPs each, all are nSNPs, suggesting a high selective pressure on these two genes. Majority of genes contained ϡ SNP are larger than 1 kb in size with an average SNP density of 1 SNP per 1.44 kb. Remaining 232 genes that harbored only one SNP represented a similar SNP density of one SNP per 1.44 kb that was identified in other mycobacterium (Qi et al., 2009). For the M. ap JTC 1281 and M. ap 4B, the percentage of nSNPs were 52.68 and 51.76% respectively, and the rest of M. ap strains with 㹠% of SNPs were nSNPs. Interestingly, genes encoding the Cytochrome P450 proteins harbored a high number of alleles in three of the six examined genomes (Table ​ (Table5), 5 ), similar to the same family of genes in M. tuberculosis (Cole, 1999). Intergenic SNPs were identified and counted for 㰐% of total SNPs.

The total number of single nucleotide polymorphism (SNP) among M. ap isolates. The number of nSNP (non-synonymous) and sSNP (synonymous) and SNPs in the intergenic regions are color coded as indicated. SNPs were detected using reference assembled sequences of each strain. About 60� SNPs were detected M. ap isolates. Percentage of nSNP is generally higher than sSNP which indicates a high selective pressure in these strains.

Table 4

A list of non-synonymous SNPs in M. ap genome resulted in more than one strain.

StrainsK-10 positionK-10 alleleVariationGeneFunction
1All 6 strains3,259,329CTMAPK 2850Trypsin-like serine protease
2All 6 strains4,394,282AGMAPK 3393Fucose permease
3All 6 strains2,041,445TCglnEGlutamine synthase
4ATCC 198698, JTC 1281, JTC 1285, DT 3, Env 2101,169,976ACMAPK 1064Hemolysin-like protein
5ATCC 198698, JTC 1281, JTC 1285, DT 3, Env 21091,310AGnirBNitrate reductase
6JTC 1281, JTC 1285, M. ap 4B, Env 2103,133,871GAspeESpermidine synthase
7JTC 1281, M. ap 4B, DT 3, Env 2102,806,612GTcydDATP-binding protein ABC transporter CydD
8ATCC 19698, JTC 1281, JTC 1285, DT 33,278,891ATpyrHUridylate kinase PyrH
9ATCC 19698, JTC 1281, DT 31,204,735TCbpoBPeroxidase BpoB
10JTC 1281, JTC 1285, Env 2104,206,587CTpks2Polyketide synthase Pks2
11M. ap 4B, Env 2101,50,857GClipWEsterase LipW
12M. ap 4B, Env 2102,25,551CTfctATransferase
13M. ap 4B, Env 2106,47,971CAnuoLNADH dehydrogenase subunit L
14M. ap 4B, Env 2102,353,857CAMAPK 2071, hspRHeat shock regulator protein
15M. ap 4B, Env 2103,981,515GApks13Polyketide synthase Pks13
16M. ap 4B, Env 2104,262,844TGMAPK 3814Lipoprotein
17ATCC 19698, DT 31,363,662ACMAPK 1234Arabinose efflux permease

Table 5

A list of nSNP in cytochrome P450 proteins.

StrainsK-10 positionK-10 alleleVariationGeneAmino acid change (functional consequence)
Env 2101,227,540AGMAPK 1119Ile → Met (non-polar)
JTC 12851,301,615CTMAPK 1184Glu → Lys (Polar acidic → polar basic)
JTC 12852,024,939GAMAPK 1789Ala → Val (non-polar)
JTC 12811,973,792AGMAPK 1738Val →𠂚la (non-polar)
JTC 12813,841,168GCMAPK 3424Arg → Pro (polar basic → non-polar)

Generally, a modest number of SNPs were detected among genomes of M. ap isolates, unlike M. avium isolates. The M. avium DT 78 genome had a significantly high number of SNPs detected (6,278 SNPs) when compared to the standard M. avium 104 genome suggesting an earlier separation of this strain during its evolutionary pathway. In addition, 㹵% of the identified SNPs were synonymous, an indication of a higher stabilizing selective pressure for M. avium genes than those of M. ap. For the M. avium Env 77, SNP detection was not performed because the whole sequence aligned poorly with either M. ap K-10 or M. avium 104. Finally, 10 SNPs were randomly chosen for further confirmation using the Sanger sequencing method. The 10 SNPs were chosen based on the ATCC 19698 genome. The same 10 SNPs were also found in JTC 1281, while only 5 common SNPs were found in JTC 1285. All amplicons were sequenced from both forward and reverse strands (Table ​ (TableA3 A3 in Appendix). Three SNPs were not detected in JTC 1285 based on the Sanger results, and is most likely caused by the Illumina sequencer error. Overall, Illumina sequencing was very beneficial in providing a high level of single nucleotide polymorphism in all examined genomes.

Phylo-genomic relationship among M. ap isolates

Single nucleotide polymorphisms of six M. ap strains were concatenated and used for phylogenetic analysis on a genome-wide (phylo-genome) level. The two reference strains, M. ap K-10 and M. avium 104, were included in the analysis. A total of 301 SNPs present among the six M. ap strains as well as in M. avium 104 and M. avium DT 78 genomes were included in this analysis using the Neighbor-joining method (Tamura et al., 2011). The un-rooted tree showed a strong discriminatory power of SNP for all examined isolates based on their origin (Figure ​ (Figure5A) 5 A) while maintained branches of M. avium genomes separate from genomes of M. ap isolates. Such discriminatory power was not possible when single-gene genotypes were tried (see above). Nonetheless, when the tree was rooted to M. avium 104 genome, two distinct major branches within the M. ap genomes were easily discerned (Figure ​ (Figure5 5 B).

Phylogenomic analysis of M. ap and M. avium strains. (A) A dendrogram displaying an un-rooted, Neighbor-joining tree of the concatenated SNPs from all eight mycobacterial isolates under study. (B) A rooted Neighbor-joining tree using M. ah 104 genome as out group. The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed. The bootstrap replicates are marked on each branch and a less than 50% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the branches.

In one branch within M. ap genomes (Figure ​ (Figure5B), 5 B), an isolate from red deer (M. ap DT 3) was closely related to the standard cattle strains (M. ap K-10 and ATCC 19698). On the other hand, isolates from goat and oryx (M. ap JTC 1281 and JTC 1285, respectively) were more closely related to the recently isolated cattle type strain (M. ap K-10) than to other laboratory strain (ATCC 19698), suggesting a cattle source of infection. In the other branch of the tree, M. ap 4B and M. ap Env 210 isolates from human and dairy farm, respectively, were closely related to each other. It is noteworthy to mention here that the association of M. avium DT 78 genome to the M. avium 104 strain based on phylo-genomic analysis confirmed our earlier identification of this isolate to belong M. avium group despites its overall higher similarity to the M. ap K-10. Finally, when we tried additional three methods for tree construction (MP, ML, MLK) on independent lists of sSNPs and nSNPs, a congregant topology was obtained for all trees with a high bootstrap support, similar to the one showed in Figure ​ Figure5B. 5 B. The Log Likelihood Ratio test for MLK consensus tree against ML tree indicated that the molecular clock assumption was not valid (p <𠂐.007). Overall, the identified tree topology suggests that M. avium 104 as a common ancestor from which M. ap likely emerged and diversified into two lineages: a lineage that clustered Env 210 with M. ap 4B (Human) while the second clustered all type II strains of M. ap. In both lineages, infected cows are the most likely reservoir for spreading the type II M. ap strains.


2 Answers 2

You can certainly do something like this with OpenMP, but it isn't as simple as putting #pragma omp parallel around a for loop. For that structure, the compiler needs to know at the time of entering the loop how many iterations will be made, so that it can decompose the iterations across threads and you necessarily don't have that information here when you're exiting once you've found something.

You can make something like this work - and it can be very useful if the test you need to perform is very CPU heavy (here, I have a made-up example of brute-force primality testing), so that you're breaking up the work amongst several cores, and you only care about finding a result (or that there are none). But note that you are definitely not guaranteed that doing this in parallel will return the first result.

In the below example, we have a flag found that is set (using an omp atomic capture construct) when a thread finds an item. If it was the first to set the flag, it stores the value and location. Once the threads (eventually) see the flag has been set, they all return from the while loop.


Performance

Today (2019-12-09) I conduct performance tests on macOS v10.13.6 (High Sierra) for chosen solutions. I show delete (A), but I do not use it in comparison with other methods, because it left empty space in the array.

  • the fastest solution is array.splice (C) (except Safari for small arrays where it has the second time)
  • for big arrays, array.slice+splice (H) is the fastest immutable solution for Firefox and Safari Array.from (B) is fastest in Chrome
  • mutable solutions are usually 1.5x-6x faster than immutable
  • for small tables on Safari, surprisingly the mutable solution (C) is slower than the immutable solution (G)

Scalability and sparsity issues in recommender datasets: a survey

Recommender systems have been widely used in various domains including movies, news, music with an aim to provide the most relevant proposals to users from a variety of available options. Recommender systems are designed using techniques from many fields, some of which are: machine learning, information retrieval, data mining, linear algebra and artificial intelligence. Though in-memory nearest-neighbor computation is a typical approach for collaborative filtering due to its high recommendation accuracy its performance on scalability is still poor given a huge user and item base and availability of only few ratings (i.e., data sparsity) in archetypal merchandising applications. In order to alleviate scalability and sparsity issues in recommender systems, several model-based approaches were proposed in the past. However, if research in recommender system is to achieve its potential, there is a need to understand the prominent techniques used directly to build recommender systems or for preprocessing recommender datasets, along with its strengths and weaknesses. In this work, we present an overview of some of the prominent traditional as well as advanced techniques that can effectively handle data dimensionality and data sparsity. The focus of this survey is to present an overview of the applicability of some advanced techniques, particularly clustering, biclustering, matrix factorization, graph-theoretic, and fuzzy techniques in recommender systems. In addition, it highlights the applicability and recent research works done using each technique.

This is a preview of subscription content, access via your institution.


Online delivery route recommendation in spatial crowdsourcing

With the emergence of many crowdsourcing platforms, crowdsourcing has gained much attention. Spatial crowdsourcing is a rapidly developing extension of the traditional crowdsourcing, and its goal is to organize workers to perform spatial tasks. Route recommendation is an important concern in spatial crowdsourcing. In this paper, we define a novel problem called the Online Delivery Route Recommendation (OnlineDRR) problem, in which the income of a single worker is maximized under online scenarios. It is proved that no deterministic online algorithm for this problem has a constant competitive ratio. We propose an algorithm to balance three influence factors on a worker’s choice in terms of which task to undertake next. In order to overcome its drawbacks resulting from the dynamic nature of tasks, we devise an extended version which attaches gradually increased importance to the destination of the worker over time. Extensive experiments are conducted on both synthetic and real-world datasets and the results prove the algorithms proposed in this paper are effective and efficient.

This is a preview of subscription content, access via your institution.


Introduction

Location information is crucial for most applications and protocol designs in high-speed vehicular ad-hoc networks (VANETs), ranging from information exchanging to in-network storage. In traditional approaches, location information can be obtained through localization techniques. With certain object tracking and information publication mechanisms, the locations of mobile object are also available for users. Localizations and object tracking are extensively studied topics and many useful algorithms have been proposed. Recently, real traffic trace and maps and even traffic pattern have been introduced to assist routing in vehicular networks[1–3].

In highly dynamic environments such as the VANETs, however, these approaches are not efficient due to the high mobility of objects. In VANETs, objects are typically vehicles that present the mobility of hundreds of kilometers per hour. Therefore, the locations of the vehicle objects keep changing dramatically in a large scale. This nature demands the localization techniques to be frequently invoked and the location information to be continuously updated, incurring a large amount of communication and control overhead. Recall that the communication capacity of wireless networks is constrained by the wireless medium[4]. As the network scales up, the demand for control packet exchange increases while the network capacity decreases, leading the problem to become more serious. In other words, these traditional approaches are not scalable in large-scale VANETs.

In this article, we propose a novel approach, which is based on our observation that vehicles’ urban environments are well behaved and can accurately be predicted. More specifically, VANETs in an urban environment is structured based on the traffic transportation network such as the roads, bridges, and tunnels. Vehicles have to strictly follow the road and travel along single direction of each road segment. When the speeds of vehicles are available (it can be obtained through speedometer on vehicles directly), the locations of the vehicles in the future short period of time can be calculated by a simple equation. Moreover, vehicles in urban area often have clear destinations and the desired transport routes are limited. When the destinations are predicted according to the source of the vehicles, the present location, and the moving directions, the locations of the vehicles in a relatively long time can also accurately be predicted in a large degree. As such for each vehicle we can obtain its location in a proactive manner rather than the traditional reactive manner, and a large amount of control overhead can be saved.

To validate this idea, we firstly extract Vehicular Mobility Pattern (VMP) by employing the Variable-order Markov (VOM) models[5] from real trace data collected from over 4,000 taxis over several months in Shanghai. We find that because of the intrinsic nature of roads, such as single and dual carriageway, free way and the individual driving habit, there exist large amounts of reusable mobility patterns in the traffic trace, which accounts for around 40% of the whole traces, i.e., VMP typically includes fixed route or vehicle’s favorite paths given the starting place and the destination of vehicles. To see the benefits of VMP, we propose then a Prediction-based Soft Routing Protocol (PSR) in which the traffic trace and the real digital road map are utilized to assist packet routing. In PSR, the disseminated state information carries vehicle’s current state and the predictive states, and the state information is only requested and updated when the last predictive state information is not consistent with the vehicle’s current state, which significantly save the control packet overhead. Finally, extensive experimental results show that VMP exhibits quit high accuracy, and offers significant enhancement to routing design in cutting control overhead. In PSR, the control traffic overhead increases linearly with the number of nodes in the network, regardless of network size or mobility.

The rest of the article is organized as follows. In the following section, we present the network model and the VOM scheme which is used to generate VMP. We discuss the design of PSR in Section “PSR design”, followed by the performance evaluations in Section “Performance evaluation”. Section “Related study” gives a review of related works. We conclude the study in Section “Conclusions” as well as the possible future work directions.

Let T = r1, r2, …, r n denote a vehicle node trajectory sequence, where r i depict the node’s i th passing road, r1, r2, … r nR = <R 1 , R 2 , … R m >, R is the set of all roads, and m = |R| is the cardinality of R. A sequence segment r i k is denoted as r i k = r iri+ 1ri+k- 1, where k is the length of the sequence segment and r i 0 = ε.

Definition 1

The term VMP is a trajectory segment r i k with high probability, that is f (r i k ) = Pr (ri+k- 1|r i k– 1 ) ≧ σ, where σ ∈ [0, 1] is a predefined threshold.

VMP in the real trace

Previous study[6] shows people’s regularity of movement and repetition of journeys to the same place. Our analysis on the traffic trace also shows that people have a high degree of regularity in their movement despite the complex driving behavior. For example, consider the condition of roads. Freeways normally have limited accesses and outcomes. The vehicles’ speed and direction are relatively stable and we can easily know vehicles’ future trajectories based on their current position and velocity information until they reach the end of freeways. Or if a road only has one connected road on some end which is meanwhile the popular path, we can estimate that most vehicles will turn that way, with a very few exceptions making U-turns to the prior road. Also, the paths to some hot spots are relatively fixed.

Figure 1 displays some VMP mined in accordance to the road condition. South Chongqing Road is a bidirectional freeway, on which vehicle nodes are characterized with high speed and run all the way along the freeway until they arrive at the outcomes. VMP in Figure 1a shows pairs of bidirectional segments in accordance to above analysis. Another example, the path from urban area to Shanghai Pudong International Airport is a highway, which is preferred by most drivers to go to the airport. Therefore, it forms VMP as shown in Figure 1b.

VMP on the digital road map of Shanghai. (a) VMP around South Chongqing Rd. on the digital map. (b) VMP from urban area to Shanghai Pudong International Airport.

We can also take the behavior of individual vehicle nodes into account. Admittedly, there is no apparent regulation to follow due to diverse individual habits. Yet, we still uncover some hidden patterns. Since people are prone to repeating the same journey to the same place[6], we are able to mine the potential VMP from their historical statistics.

We randomly choose a set of real traces of one taxi with period of 6 months to generate VMP, and mark the patterns correspondingly on the map to get a straightforward view as shown in Figure 2. From this distribution of VMP on the map of Shanghai urban area, we find that VMP occupies a great proportion of roads.

Distribution of VMP on the map of Shanghai urban area.

VMP generation

The VMP mining problem presents interesting stochastic chains of finite order which means transition probabilities depend on a finite suffix of the past and the set of the lengths of all suffix is bounded. More specifically, for a vehicle node in the current road r c, its possible patterns can be r c–k k r crc+ 1 (1 ≤ kK), where K is the maximal number of the proceeding roads of r c and is a predefined value. Clearly, K is the upper bound on the maximal Markov order. Among all these possible patterns, the ones whose probability is above the value σ will be the final VMP.

The tool Markov chain has widely been used for predicting the future location of an object. In a Markov chain however, each random variable in a sequence with a Markov property depends on a fixed number of random variables. Consequently, the number of possible patterns would be very large: patterns which incur overwhelming complexity to check all the possible patterns. We reduce the cost by pruning unnecessary patterns. First, since our patterns are not with the same length, VOM model is more adaptive in our problem which enables the state space reduced significantly. Second, although there are totally |R| roads, the patterns possibly with high frequency are obviously the ones whose consecutive sequence segments are connected roads. Third, the value K is generally a small number as shown later in Section “Performance evaluation” so that the value is set to 5 in our simulation.

We adapt an effective VOM model[3], which is very popular in the area of lossless compression and is also used widely in sequence prediction for estimating the probability and mining VMP. The algorithm is as follows.

Incrementally parsing procedure. We sequentially parse r1 n into ‘phrases’ which are adjacent while non-overlapping. The first phrase is an empty phrase O. A new phrase is then created as soon as a prefix of the unparsed part of the string differs from all preceding phrases. Figure?3 shows an instance of road map, according to which a road sequence acdacbacdabdc is generated. We parse the sequence and get phases O, a, c, d, ac, b, acd, ab, dc.

Learning phase. A multi-way parsing tree is constructed to display the parsed phrases. Each node in the tree carries a counter that hold statistics of r1 n and each internal node has exactly |R| children (|R|?=?4 in the above example). Each phrase can find a path in the tree starting from the root while ending at some internal node. By going through the parsed sequence starting with O we add each phrase to the tree as follows. First empty phrase O is added to the tree as root and then its |R| children are added to it as leaf nodes. The counter of each leaf node in the tree is always set to 1. The counter of internal node is updated to ensure it is always equal to the sum of all its children’s counter. Then for each phrase, we traverse the tree starting at the root. Once a leaf node is reached, it is transformed into an internal node by adding |R| leaf children to it.

To estimate the probability f (r i k )?=?Pr (ri+k– 1|r i k– 1 ). We traverse the parsing tree starting from the root O according to the sequence r i k– 1 . If we reach a leaf node before ending the sequence r i k– 1 , we will jump to the root to continue the traverse until we use up the sequence. We then go one step further according to ri+k– 1 and reach the final node d. Thus we can compute the estimation Pr(ri+k– 1|r i k– 1 )?=?c(d)/c(Parent(d)), where c(d) denotes the counter of node d.

An instance of road map.

A pseudo code of our VMP generating algorithm is given in Figure 4. Denote P as the final pattern set, Adj(r i) as the set of adjacent roads of road r i. We show a parsing tree according to the above sequence instance acdacbacdabdc in Figure 5. To estimate Pr(d|ac), we traverse the tree in the following order: O → a → c → d and get the result Pr(d|ac) = 4/7 = 0.57. For Pr(c|da), we traverse in the order: O → d → a → O → c and get Pr(d|ac) = 4/28 = 0.14.