[Rd] Get Logical processor count correctly whether NUMA is enabled or disabled

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Mon Sep 3 14:07:23 CEST 2018


A summary for reference: the new detectCores() for Windows in R-devel 
seems to be working both for logical and physical cores on systems with 
 >64 logical processors  (thanks to Arun for testing!). If the feature 
is important for anyone particularly using an older version of Windows 
and/or on a system with >64 logical processors, it would be nice if you 
could test and report any possible problem.

As I mentioned earlier, in older versions of R one can as a workaround 
use "wmic" to detect the number of processors on systems with >64 
logical processors (with appropriate error handling added as needed):

# detectCores()
out <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE))))

#detectCores(logical=FALSE)
out <- system("wmic cpu get numberofcores", intern=TRUE)
sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out, 
value=TRUE))))

The remaining problem with running using >64 processors on Windows 
turned out to be due to a bug in sockets communication, debugged and 
fixed in R-devel by Luke Tierney.

Tomas

On 08/29/2018 12:42 PM, Srinivasan, Arunkumar wrote:
> Dear Tomas, thank you very much. I installed r-devel r75201 and tested.
>
> The machine with 88 cores has NUMA disabled. It therefore has 2 processor groups with 64 and 24 processors each.
>
> require(parallel)
> detectCores()
> # [1] 88
>
> This is great!
>
> Then I went on to test with a simple 'foreach()' loop. I started with 64 processors (max limit of 1 processor group). I ran with a simple function of 0.5s sleep.
>
> require(snow)
> require(doSNOW)
> require(foreach)
>
> cl <- makeCluster(64L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
> # user  system elapsed
> # 0.06    0.00    0.64
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> #    user  system elapsed
> #    0.03    0.01    1.04
> stopCluster(cl)
>
> With a cluster of 64 processors and loop running with 64 iterations, it completed in ~.5s (0.64), and with 65 iterations, it took ~1s as expected.
>   
> cl <- makeCluster(65L, "SOCK")
> registerDoSNOW(cl)
> system.time(foreach(i=1:64) %dopar% Sys.sleep(0.5))
>     user  system elapsed
>     0.03    0.02    0.61
> system.time(foreach(i=1:65) %dopar% Sys.sleep(0.5))
> # Timing stopped at: 0.08 0 293
> stopCluster(cl)
>
> However, when I increased the cluster to have 65 processors, a loop with 64 iterations seem to complete as expected, but using all 65 processors to loop over 65 iterations didn't seem to complete. I stopped it after ~5mins. The same happens with the cluster started with any number between 65 and 88. It seems to me like we are still not being able to use >64 processors all at the same time even if detectCores() returns the right count now.
>
> I'd appreciate your thoughts on this.
>
> Best,
> Arun.
>
> -----Original Message-----
> From: Tomas Kalibera <tomas.kalibera using gmail.com>
> Sent: 27 August 2018 19:43
> To: Srinivasan, Arunkumar <Arunkumar.Srinivasan using uk.mlp.com>; r-devel using r-project.org
> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA is enabled or disabled
>
> Dear Arun,
>
> thank you for checking the workaround scripts.
>
> I've modified detectCores() to use GetLogicalProcessorInformationEx. It is in revision 75198 of R-devel, could you please test it on your machines? For a binary, you can wait until the R-devel snapshot build gets to at least this svn revision.
>
> Thanks for the link to the processor groups documentation. I don't have a machine to test this on, but I would hope that snow clusters (e.g.
> PSOCK) should work fine on systems with >64 logical processors as they spawn new processes (not just threads). Note that FORK clusters are not supported on Windows.
>
> Thanks
> Tomas
>
> On 08/21/2018 02:53 PM, Srinivasan, Arunkumar wrote:
>> Dear Tomas, thank you for looking into this. Here's the output:
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> [1] "NumberOfLogicalProcessors  \r" "22                         \r" "22                         \r"
>> [4] "20                         \r" "22                         \r" "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [I've asked the IT team to understand why one of the values is 20 instead of 22].
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> [1] "NumberOfCores  \r" "22             \r" "22             \r" "20             \r" "22             \r"
>> [6] "\r"
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE)))) # [1] 86
>>
>> [Currently hyperthreading is disabled. So this output being identical to the previous output makes sense].
>>
>> system("wmic computersystem get numberofprocessors")
>> NumberOfProcessors
>> 4
>>
>> In addition, I'd also bring to your attention this documentation: https://docs.microsoft.com/en-us/windows/desktop/ProcThread/processor-groups on processor groups which explain how one should go about running a process ro run on multiple groups (which seems to be different to NUMA). All this seems overly complicated to allow a process to use all cores by default TBH.
>>
>> Here's a project on Github 'fio' where the issue of running a process on more than 1 processor group has come up -  https://github.com/axboe/fio/issues/527 and is addressed - https://github.com/axboe/fio/blob/c479640d6208236744f0562b1e79535eec290e2b/os/os-windows-7.h . I am not sure though if this is entirely relevant since we would be forking new processes in R instead of allowing a single process to use all cores. Apologies if this is utterly irrelevant.
>>
>> Thank you,
>> Arun.
>>
>> From: Tomas Kalibera <tomas.kalibera using gmail.com>
>> Sent: 21 August 2018 11:50
>> To: Srinivasan, Arunkumar <Arunkumar.Srinivasan using uk.mlp.com>;
>> r-devel using r-project.org
>> Subject: Re: [Rd] Get Logical processor count correctly whether NUMA
>> is enabled or disabled
>>
>> Dear Arun,
>>
>> thank you for the report. I agree with the analysis, detectCores() will only report logical processors in the NUMA group in which R is running. I don't have a system to test on, could you please check these workarounds for me on your systems?
>>
>> # number of logical processors - what detectCores() should return out
>> <- system("wmic cpu get numberoflogicalprocessors", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of cores - what detectCores(FALSE) should return out <-
>> system("wmic cpu get numberofcores", intern=TRUE)
>> sum(as.numeric(gsub("([0-9]+).*", "\\1", grep("[0-9]+[ \t]*", out,
>> value=TRUE))))
>>
>> # number of physical processors - as a sanity check
>>
>> system("wmic computersystem get numberofprocessors")
>>
>> Thanks,
>> Tomas
>>
>> On 08/17/2018 05:11 PM, Srinivasan, Arunkumar wrote:
>> Dear R-devel list,
>>
>> R's detectCores() function internally calls "ncpus" function to get the total number of logical processors. However, this doesnot seem to take NUMA into account on Windows machines.
>>
>> On a machine having 48 processors (24 cores) in total and windows server 2012 installed, if NUMA is enabled and has 2 nodes (node 0 and node 1 each having 24 CPUs), then R's detectCores() only detects 24 instead of the total 48. If NUMA is disabled, detectCores() returns 48.
>>
>> Similarly, on a machine with 88 cores (176 processors) and windows server 2012, detectCores() with NUMA disabled only returns the maximum value of 64. If NUMA is enabled with 4 nodes (44 processors each), then detectCores() will only return 44. This is particularly limiting since we cannot get to use all processors by enabling/disabling NUMA in this case.
>>
>> We think this is because R's ncpus.c file uses "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION" (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683194(v=vs.85).aspx) instead of "PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX" (https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx). Specifically, quoting from the first link:
>>
>> "On systems with more than 64 logical processors, the GetLogicalProcessorInformation function retrieves logical processor information about processors in the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405503(v=vs.85).aspx to which the calling thread is currently assigned. Use the https://msdn.microsoft.com/en-us/library/windows/desktop/dd405488(v=vs.85).aspx function to retrieve information about processors in all processor groups on the system."
>>
>> Therefore, it might be possible to get the right count of total processors even with NUMA enabled by using "GetLogicalProcessorInformationEX".  It'd be nice to know what you think.
>>
>> Thank you very much,
>> Arun.
>>
>> --
>> Arun Srinivasan
>> Analyst, Millennium Management LLC
>> 50 Berkeley Street | London, W1J 8HD
>>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list